silentninja89
silentninja89

Reputation: 201

Error while trying to compute dask dataframe

I've been try to pass the compute() function to a dataframe that I have, however it keeps giving me the following error:

ValueError: Usecols do not match columns, columns expected but not found: ['COL1', 'COL2', 'COL3', 'COL4', 'COL5', 'COL6', 'COL7']

import dask.dataframe as dd


use_cols = ['COL1', 'COL2', 'COL3', 'COL4', 'COL5', 'COL6', 'COL7']

ddframe = dd.read_csv('26367*', skiprows=[0, 1, 2, 3, 4, 5, 6], sep = '|', usecols = use_cols)
ddframe.compute()

How can I resolve this issue? Thanks in advance

Upvotes: 1

Views: 526

Answers (1)

SultanOrazbayev
SultanOrazbayev

Reputation: 16581

Possibly one of the globbed files does not contain the specified columns. An easy way to check this is to print:

print(dd.read_csv('26367*', skiprows=[0, 1, 2, 3, 4, 5, 6], sep = '|').columns

If the above yields an error, then you might want to explore the globbed files:

import glob
for f inb glob.glob('26367*'):
    print(dd.read_csv(f, skiprows=[0, 1, 2, 3, 4, 5, 6], sep = '|').columns

This will show if the columns are consistently defined in the files.

Upvotes: 1

Related Questions