Reputation: 1904
I am reading a csv file as pandas dataframe like so:
num_cols = 80
df = pd.read_csv(read_path, compression='zip', header=None, sep=',', usecols=range(num_cols), low_memory=False)
Here, the number of columns is not fixed but I want to read the first 80 columns only, that's why I am using usecols
. Now for files which have more than 80 columns this works fine, but for files having less than 80 columns, it's throwing this kind of error:
Usecols do not match columns, columns expected but not found: [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
How can I fix it? Ideally, I would like to read first 80 columns (the last few columns will simply contain NaNs).
Upvotes: 1
Views: 2131
Reputation: 1904
As many pointed out, its better to read all the data first and then select first 80 rows. I achieved this by using names=range(100)
as no row will have more than 100 columns, like so:
num_cols = 80
df = pd.read_csv(read_path, compression='zip', header=None, sep=',', names=range(100), low_memory=False)
df = df.iloc[: , :num_cols]
Upvotes: 1
Reputation: 9207
You could try a workaround by using nrows=1
parameter:
num_cols = 80
cols = len(pd.read_csv(read_path, compression='zip', header=None, sep=',', low_memory=False, nrows=1).columns)
if cols < num_cols:
num_cols = cols
df = pd.read_csv(read_path, compression='zip', header=None, sep=',', usecols=range(num_cols), low_memory=False)
Load the data but only the first row, check number of columns, then read whole thing without those additional columns.
Upvotes: 1