nrw.social ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
Wir sind eine freundliche Mastodon Instanz aus Nordrhein-Westfalen. Ob NRW'ler oder NRW-Sympathifanten, jeder ist hier willkommen.

Serverstatistik:

2,8 Tsd.
aktive Profile

#Dask

0 Beiträge0 Beteiligte0 Beiträge heute

#DuckDB (and a tonne of RAM) have absolutely saved my behind these last few months while dealing with huge biological datasets.

If you do any #data munging at all on a daily basis, well worth picking up DuckDB. Don't let the DB part fool you, it's more like #Dask or #Spark but #SQL .

duckdb.org/

DuckDBAn in-process SQL OLAP database management systemDuckDB is an in-process SQL OLAP database management system. Simple, feature-rich, fast & open source.
Fortgeführter Thread

I've improved my StackOverflow question and added a bounty. I'm once again asking the amazing #python , #dask , and #Django community if you could offer some of your knowledge to me and the world 🤟🐍 I suppose this might just be a #Dask question, but I am boosting it to reach out to anyone that might lend a hand 💚 stackoverflow.com/q/79198230

Stack OverflowDjango + Dask integration: usage and progress?About performance & best practice Note, the entire code for the question below is public on Github. Feel free to check out the project! https://github.com/b-long/moose-dj-uv/pull/3 I'm trying...

I am moving all my computing libraries to #xarray, no regrets. It is a natural way to manipulate datasets of rectangular arrays, with named coordinates and dimensions: xarray.dev/
There are several possible backends, including #dask which allows lazy data loading.
I had the pleasure of meeting some of the devs last week, who showed me a preview of the upcoming `DataTree` structure which is going to make this library even more versatile!

xarray.devXarray: N-D labeled arrays and datasets in Python

Good morning folks! It's been a while since I did one of my #TwitterMigration #Introduction #ConnectionList posts where I curate interesting people for you to follow on the #Fediverse :fediverse:

Today, I'd like you to meet:

@LMonteroSantos Lola is a #PhD #researcher at #EUI interested in #data #regulation, digital #economy and #AntiTrust, passionate about #DataScience and #programming. New to Mastodon, please make welcome 👋 🇪🇺

@danlockton is a #Professor at @TUEindhoven where he works in #design, #imagination and #climate #futures. He often posts interesting things around co-design and #collaboration 🇳🇱

@1sabelR is a #researcher @ANUResearch where she is into #SolarPunk and @scicomm 🇦🇺 She co-hosts the #SciBurst #podcast - worth a listen!

@timrichards is a #travel #writer based in #Naarm / #Melbourne in Australia, specialising in #rail 🇦🇺

@microstevens is a #DataScience facilitator at #UWMadison and she works in #OpenScience and #genomics 💻 🧬

@mrocklin does amazing things with #dask in #python, and I am very grateful in recent weeks for his posts and #StackOverflow responses. Thank you 🙏 🐍

@everythingopen is Australia's premier open #technology conference, covering #linux, #OpenSource, #OpenData, #OpenGov, #OpenGLAM, #OpenScience and everything else open. You should check it out! 🐧 🇦🇺

That's all for today - don't forget to share your own lists so we can more richly connect the :fediverse: and curate the conversations we want to have ❤️

I'm taking my first foray into #dask - have done the tutorial and read what I can in Stack Overflow.

It's definitely a steep learning curve, but it's been very interesting so far.

@holden's excellent book has been very useful so far, and I think the more I work with it, the more I will master the nuances - how to set up the Client scheduler with optimum workers and threads, the optimum partitions etc.

So im almost finished with my first independent implementation of a standard and I want to write up the process bc it was surprisingly challenging and I learned a lot about how to write them.

I was purposefully experimenting with different methods of translation (eg. Adapter classes vs. pure functions in a build pipeline, recursive functions vs. flattening everything) so the code isnt as sleek as it could be. I had planned on this beforehand, but two major things I learned were a) not just isolating special cases, but making specific means to organize them and make them visible, and b) isolating different layers of the standard (eg. schema language is separate from models is separate from I/O) and not backpropagating special cases between layers.

This is also my first project thats fully in the "new style" of python thats basically a typed language with validating classes, and it makes you write differently but uniformly for the better - it's almost self-testing bc if all the classes validate in an end-to-end test then you know that shit is working as intended. Forcing yourself to deal with errors immediately is the way.

Lots more 2 say but anyway we're like 2 days of work away from a fully independent translation of #NWB to #LinkML that uses @pydantic models + #Dask for arrays. Schema extensions are now no-code: just write the schema (in nwb schema lang or linkml) and poof you can use it. Hoping this makes it way easier for tools to integrate with NWB, and my next step will be to put them in a SQL database and triple store so we can yno more easily share and grab smaller pieces of them and index across lots of datasets.

Then, uh, we'll bridge our data archives + notebooks with the fedi for a new kind of scholarly communication....