Reputation: 549
I am using dask to read a csv file. However, i couldn't apply or compute any operation on it because of this error:
Do you have ideas what is this error all about and how to fix it?
Upvotes: 0
Views: 1731
Reputation: 549
On reading csv file in dask, errors comes in upon not recognizing the correct dtype of columns.
For example, we read a csv file using dask as follows:
import dask.dataframe as dd
df = dd.read_csv('\data\file.txt', sep='\t', header='infer')
This prompts the error mentioned above.
To solve this problem, as suggested by @mrocklin on this comment, https://github.com/dask/dask/issues/1166, we need to determine the dtype of the columns. We can do this by reading the csv file in pandas and identify the data type and pass that as argument in reading csv using dask.
df_pd = pd.read_csv('\data\file.txt', sep='\t', header='infer')
dt = df_pd.dtypes.to_dict()
df = dd.read_csv('\data\file.txt', sep='\t', header='infer', dtype=dt)
Upvotes: 4