Reputation: 136197
I just tried
import dask.dataframe as dd
df = dd.read_csv("data.csv")
print(df.describe())
which gives
Dask DataFrame Structure:
SOME_COL FOO BAR
npartitions=1 float64 float64 float64
... ... ...
Dask Name: describe, 1234 tasks
There are two problems:
What is the problem?
Upvotes: 0
Views: 872
Reputation: 446
Calling dd.read_csv() does not actually do much. After this you should call .compute() method to actually read csv into dask dataframe.
This means dask is lazy. If you have only 4GB csv file and enough RAM maybe you can read csv in chunks directly with pandas. Also set parameter low_memory=False in pandas.read_csv.
Upvotes: 0
Reputation: 57251
Dask.dataframe is lazy by default. You need to call .compute()
when you want a real answer.
print(df.describe().compute())
Upvotes: 1