Alger Remirata
Alger Remirata

Reputation: 549

Value Error on Dask DataFrames

I am using dask to read a csv file. However, i couldn't apply or compute any operation on it because of this error:

Do you have ideas what is this error all about and how to fix it?enter image description here

Upvotes: 0

Views: 1731

Answers (1)

Alger Remirata
Alger Remirata

Reputation: 549

On reading csv file in dask, errors comes in upon not recognizing the correct dtype of columns.

For example, we read a csv file using dask as follows:

import dask.dataframe as dd

df = dd.read_csv('\data\file.txt', sep='\t', header='infer')

This prompts the error mentioned above.

To solve this problem, as suggested by @mrocklin on this comment, https://github.com/dask/dask/issues/1166, we need to determine the dtype of the columns. We can do this by reading the csv file in pandas and identify the data type and pass that as argument in reading csv using dask.

df_pd = pd.read_csv('\data\file.txt', sep='\t', header='infer')
dt = df_pd.dtypes.to_dict()
df = dd.read_csv('\data\file.txt', sep='\t', header='infer', dtype=dt)

Upvotes: 4

Related Questions