Reputation: 4060
I have a dataframe with a lot of columns in it. Now I want to select only certain columns. I have saved all the names of the columns that I want to select into a Python list and now I want to filter my dataframe according to this list.
I've been trying to do:
df_new = df[[list]]
where list
includes all the column names that I want to select.
However I get the error:
TypeError: unhashable type: 'list'
When I do:
df_new = df[list]
I get the error
KeyError: not in index
Any help on this one?
Upvotes: 28
Views: 60458
Reputation: 23131
[]
aka __getitem__()
The canonical way to select a list of columns from a dataframe is via []
.
df = pd.DataFrame({'a': [1, 1, 1, 1], 'b': [2, 2, 1, 0], 'c': [3, 3, 1, 0]})
cols = ['a', 'b']
df1 = df[cols]
Note that all column labels in cols
have to also be df
(otherwise KeyError: "... not in index"
will be raised).
One thing to note is that when you want to assign new columns to df1
as filtered above (e.g. df1['new'] = 0
), a SettingWithCopyWarning
will be raised. To silence it, explicitly make a new copy:
df1 = df[cols].copy()
KeyError: "... not in index"
To ensure cols
contains only column labels that are in df
, you can call isin
on the columns and then filter df
.
cols = ['a', 'b', 'f']
df1 = df[cols] # <----- error
df1 = df.loc[:, df.columns.isin(cols)] # <----- OK
filter()
Another way to select a list of columns from a dataframe is via filter()
. A nice thing about it is that it creates a copy (so no SettingWithCopyWarning
) and only selects the column labels in cols
that exist in the dataframe, so handles the KeyError
under the hood.
cols = ['a', 'b', 'f']
df1 = df.filter(cols)
As can be seen from the output below, f
in cols
is ignored because it doesn't exist as a column label in df
.
Upvotes: 3
Reputation: 862641
You can remove one []
:
df_new = df[list]
Also better is use other name as list
, e.g. L
:
df_new = df[L]
It look like working, I try only simplify it:
L = []
for x in df.columns:
if not "_" in x[-3:]:
L.append(x)
print (L)
List comprehension
:
print ([x for x in df.columns if not "_" in x[-3:]])
Upvotes: 41