AButkov
AButkov

Reputation: 455

Keeping columns in the specified order when using UseCols in Pandas Read_CSV

I have a csv file with 50 columns of data. I am using Pandas read_csv function to pull in a subset of these columns, using the usecols parameter to choose the ones I want:

cols_to_use = [0,1,5,16,8]
df_ret = pd.read_csv(filepath, index_col=False, usecols=cols_to_use)

The trouble is df_ret contains the correct columns, but not in the order I specified. They are in ascending order, so [0,1,5,8,16]. (By the way the column numbers can change from run to run, this is just an example.) This is a problem because the rest of the code has arrays which are in the "correct" order and I would rather not have to reorder all of them.

Is there any clever pandas way of pulling in the columns in the order specified? Any help would be much appreciated!

Upvotes: 27

Views: 14482

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

you can reuse the same cols_to_use list for selecting columns in desired order, in case you are using column names:

df_ret = pd.read_csv(filepath, index_col=False, usecols=cols_to_use)[cols_to_use]

NOTE: this solution will work properly only if cols_to_use contains names of columns. It won't work with column indices because after using usecols parameter the resulting DataFrame will have a smaller subset of columns compared to columns from the CSV file. And their indices will be different.

Upvotes: 25

PeptideWitch
PeptideWitch

Reputation: 2359

Just piggybacking off this question here (hi from 2018).

I discovered the same problem with my pandas read_csv and wanted to figure out a way to take the [col_reorder] using column header strings. It's as simple as defining an array of strings to use.

pd.read_csv(filepath, index_col=False, usecols=cols_to_use)[index_strings]

Upvotes: 2

Related Questions