Ank
Ank

Reputation: 1904

Pandas read specific number of columns irrespective of data file

I am reading a csv file as pandas dataframe like so:

num_cols = 80
df = pd.read_csv(read_path, compression='zip', header=None, sep=',', usecols=range(num_cols), low_memory=False)

Here, the number of columns is not fixed but I want to read the first 80 columns only, that's why I am using usecols. Now for files which have more than 80 columns this works fine, but for files having less than 80 columns, it's throwing this kind of error:

Usecols do not match columns, columns expected but not found: [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]

How can I fix it? Ideally, I would like to read first 80 columns (the last few columns will simply contain NaNs).

Upvotes: 1

Views: 2131

Answers (2)

Ank
Ank

Reputation: 1904

As many pointed out, its better to read all the data first and then select first 80 rows. I achieved this by using names=range(100) as no row will have more than 100 columns, like so:

num_cols = 80
df = pd.read_csv(read_path, compression='zip', header=None, sep=',', names=range(100), low_memory=False)
df = df.iloc[: , :num_cols]

Upvotes: 1

Andreas
Andreas

Reputation: 9207

You could try a workaround by using nrows=1 parameter:

num_cols = 80
cols = len(pd.read_csv(read_path, compression='zip', header=None, sep=',', low_memory=False, nrows=1).columns)
if cols < num_cols:
    num_cols = cols
df = pd.read_csv(read_path, compression='zip', header=None, sep=',', usecols=range(num_cols), low_memory=False)

Load the data but only the first row, check number of columns, then read whole thing without those additional columns.

Upvotes: 1

Related Questions