Reputation: 201
I've been try to pass the compute() function to a dataframe that I have, however it keeps giving me the following error:
ValueError: Usecols do not match columns, columns expected but not found: ['COL1', 'COL2', 'COL3', 'COL4', 'COL5', 'COL6', 'COL7']
import dask.dataframe as dd
use_cols = ['COL1', 'COL2', 'COL3', 'COL4', 'COL5', 'COL6', 'COL7']
ddframe = dd.read_csv('26367*', skiprows=[0, 1, 2, 3, 4, 5, 6], sep = '|', usecols = use_cols)
ddframe.compute()
How can I resolve this issue? Thanks in advance
Upvotes: 1
Views: 526
Reputation: 16581
Possibly one of the globbed files does not contain the specified columns. An easy way to check this is to print:
print(dd.read_csv('26367*', skiprows=[0, 1, 2, 3, 4, 5, 6], sep = '|').columns
If the above yields an error, then you might want to explore the globbed files:
import glob
for f inb glob.glob('26367*'):
print(dd.read_csv(f, skiprows=[0, 1, 2, 3, 4, 5, 6], sep = '|').columns
This will show if the columns are consistently defined in the files.
Upvotes: 1