Pandas read specific number of columns irrespective of data file

Question

I am reading a csv file as pandas dataframe like so:

num_cols = 80
df = pd.read_csv(read_path, compression='zip', header=None, sep=',', usecols=range(num_cols), low_memory=False)

Here, the number of columns is not fixed but I want to read the first 80 columns only, that's why I am using usecols. Now for files which have more than 80 columns this works fine, but for files having less than 80 columns, it's throwing this kind of error:

Usecols do not match columns, columns expected but not found: [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]

How can I fix it? Ideally, I would like to read first 80 columns (the last few columns will simply contain NaNs).

Ank · Accepted Answer

As many pointed out, its better to read all the data first and then select first 80 rows. I achieved this by using names=range(100) as no row will have more than 100 columns, like so:

num_cols = 80
df = pd.read_csv(read_path, compression='zip', header=None, sep=',', names=range(100), low_memory=False)
df = df.iloc[: , :num_cols]

Pandas read specific number of columns irrespective of data file

Answers (2)

Related Questions