Reputation: 67
I have two dataframes:
The first dataframe is simply one column where each row is a column name in the 2nd dataframe. This dataframe only contains a subset of all the columns.
What I want to do is remove the columns in the 2nd dataframe that are referenced in the smaller one. I've written a loop that does that, but I was wondering if there was a more efficient way to do so, as I need to remove about 5,000 columns.
Here's my code that accomplishes this task:
for i in to_remove['column_name']:
df = df.drop(i, axis=1)
Thanks!
Upvotes: 1
Views: 48
Reputation: 59264
Take a look at this example:
df = pd.DataFrame({'cols': ['col1', 'col2']})
df2 = pd.DataFrame({'col1': ['a', 'b'],
'col2': ['a', 'b'],
'col3': ['a', 'b'],
'col4': ['a', 'b']})
Such that
>>> df
cols
0 col1
1 col2
>>> df2
col1 col2 col3 col4
0 a a a a
1 b b b b
isin
+ ~
You can use isin
+ unary opearator ~
df2.loc[:, ~df2.columns.isin(df.cols)]
col3 col4
0 a a
1 b b
drop
+axis=1
df2.drop(df.cols, axis=1) # same as df2.drop(columns=df.cols)
col3 col4
0 a a
1 b b
These commands return a new df
. So do not forget to assign the result back to a variable (e.g. df2 = df2.drop(df.cols, axis=1)
)
Upvotes: 1
Reputation: 1133
This should do it:
df.drop(to_remove.column_name, axis=1, inplace=True)
Upvotes: 0
Reputation: 2522
I may be misunderstanding what you are looking for but the following should work:
df_new = df.drop(columns=to_remove['column_name'])
Upvotes: 0