Ícaro Lorran
Ícaro Lorran

Reputation: 96

Split strings into a huge amount of columns in dask

I have a dask Series X filled with strings containing a lot of text that I want to split it into columns. This is what I was doing:

cols = 2867847
W = X.str.split(n=cols, expand=True) #X has 3320 lines and npartitions=1000

I can't simply increase the number of partitions to account for the column sizer because dask partitions the DataFrame line-wise. Is it possible to make partitions over the columns instead?

Upvotes: 0

Views: 138

Answers (1)

MRocklin
MRocklin

Reputation: 57281

It is odd to use Pandas style dataframes with thousands of columns. Perhaps there is some other API that would suit your situation better? Maybe dask.delayed or dask.bag or xarray?

Upvotes: 1

Related Questions