Reading data that doesn't fit in to memory with Dask

Question

I have a big file (25GB) that doesn't fit in to memory. I want to do some operations on this with Dask. I have tried two approaches, but both fail on memory errors.

Approach 1

>>> import dask.dataframe as dd
>>> df = dd.read_json('myfile.jsonl', lines=True)
MemoryError:

Approach 2

>>> # split file in 12 pieces with the unix split command
>>> # all of which by themselves fit in memory
>>> import dask.dataframe as dd
>>> df = dd.read_json('myfile_split.*', lines=True)
ValueError: Could not reserve memory block

What am I doing wrong here?

Reading data that doesn't fit in to memory with Dask

Approach 1

Approach 2

Answers (1)

Related Questions

Reading data that doesn&#39;t fit in to memory with Dask

Approach 1

Approach 2

Answers (1)

Related Questions

Reading data that doesn't fit in to memory with Dask