brie
brie

Reputation: 67

Python - Dataframe contains column names that need to be dropped in another dataframe

I have two dataframes:

The first dataframe is simply one column where each row is a column name in the 2nd dataframe. This dataframe only contains a subset of all the columns.

What I want to do is remove the columns in the 2nd dataframe that are referenced in the smaller one. I've written a loop that does that, but I was wondering if there was a more efficient way to do so, as I need to remove about 5,000 columns.

Here's my code that accomplishes this task:

for i in to_remove['column_name']:
    df = df.drop(i, axis=1)

Thanks!

Upvotes: 1

Views: 48

Answers (3)

rafaelc
rafaelc

Reputation: 59264

Take a look at this example:

df = pd.DataFrame({'cols': ['col1', 'col2']})
df2 = pd.DataFrame({'col1': ['a', 'b'], 
                    'col2': ['a', 'b'], 
                    'col3': ['a', 'b'], 
                    'col4': ['a', 'b']})

Such that

>>> df
    cols
0   col1
1   col2

>>> df2
    col1    col2    col3    col4
0   a       a       a       a
1   b       b       b       b

Option1: isin+ ~

You can use isin+ unary opearator ~

df2.loc[:, ~df2.columns.isin(df.cols)]

    col3    col4
0   a       a
1   b       b

Option2: drop+axis=1

df2.drop(df.cols, axis=1) # same as df2.drop(columns=df.cols)

    col3    col4
0   a       a
1   b       b

These commands return a new df. So do not forget to assign the result back to a variable (e.g. df2 = df2.drop(df.cols, axis=1))

Upvotes: 1

Peybae
Peybae

Reputation: 1133

This should do it:

df.drop(to_remove.column_name, axis=1, inplace=True)

Upvotes: 0

tobsecret
tobsecret

Reputation: 2522

I may be misunderstanding what you are looking for but the following should work:

df_new = df.drop(columns=to_remove['column_name'])

Upvotes: 0

Related Questions