Reputation: 713
I have a DataFrame of the following form:
a b c
0 1 4 6
1 3 2 4
2 4 1 5
And I have a list of column names that I need to use to create a new DataFrame using the columns of the first DataFrame that correspond to each label. For example, if my list of columns is ['a', 'b', 'b', 'a', 'c'], the resulting DataFrame should be:
a b b a c
0 1 4 4 1 6
1 3 2 2 3 4
2 4 1 1 4 5
I've been trying to figure out a fast way of performing this operations because I'm dealing with extremly large DataFrames and I don't think looping is a reasonable option.
Upvotes: 5
Views: 8021
Reputation: 76917
From 0.17
onwards you can use reindex
like
In [795]: cols = ['a', 'b', 'b', 'a', 'c']
In [796]: df.reindex(columns=cols)
Out[796]:
a b b a c
0 1 4 4 1 6
1 3 2 2 3 4
2 4 1 1 4 5
Note: Ideally, you don't want to have duplicate column names.
Upvotes: 0
Reputation: 67
You can do that directly:
>>> df
a b c
0 1 4 6
1 3 2 4
2 4 1 5
>>> column_names
['a', 'b', 'b', 'a', 'c']
>>> df[column_names]
a b b a c
0 1 4 4 1 6
1 3 2 2 3 4
2 4 1 1 4 5
[3 rows x 5 columns]
Upvotes: 3
Reputation: 394041
You can just use the list to select them:
In [44]:
cols = ['a', 'b', 'b', 'a', 'c']
df[cols]
Out[44]:
a b b a c
0 1 4 4 1 6
1 3 2 2 3 4
2 4 1 1 4 5
[3 rows x 5 columns]
So no need for a loop, once you have created your dataframe df
then using a list of column names will just index them and create the df you want.
Upvotes: 7