Using usecols and skiprows at the same time (in Pandas read_csv) gives error

Question

I am using read_csv in Pandas v0.18.1 to read in some data. I want to choose a subset of columns and rows from the csv, so I have tried:

df_a = pd.read_csv(filepath, index_col = False, usecols=cols_to_use, skiprows=1)

This gives me a ValueError: Usecols do not match names. Note that cols_to_use is a list of column names, but if I leave out the skiprows part:

df_a = pd.read_csv(filepath, index_col = False, usecols=cols_to_use)

it works fine, and similarly if I leave out the usecols bit and put skiprows back in, that works fine too.

Could this be a bug (that you can't use usecols and skiprows at the same time)? I've tried looking in the documentation but couldn't find any mention of it. Or perhaps there is a logical reason that you can't use both?

(Also if there is a better/more obvious way of picking out a subset of columns and rows from a csv that would be appreciated too!)

Thanks in advance!

Fabian Rost · Accepted Answer

If the first row of your csv file contains the column names then skiprows=1 will ignore the row with the column names and you run into the error.

If you want to skip specific rows you can provide the row numbers as a list using e.g. skiprows=[1]. The line numbers are 0-indexed, hence the column names are in line 0 and the first data line is number 1.

Using usecols and skiprows at the same time (in Pandas read_csv) gives error

Answers (1)

Related Questions