bismo
bismo

Reputation: 1439

How to drop columns based on column name python pandas

I would like to drop every column that ends in a 'y' in my data frame. For some reason, the data I have has each column listed twice, with the only thing different being the column name, like so:

d = {'Team': ['1', '2', '3'], 'Team_y': ['1', '2', '3'], 'Color' : ['red', 'green', 'blue'], 'Color_y' : ['red', 'green', 'blue']}
df = pd.DataFrame(data=d)
df

    Team    Team_y  Color   Color_y
0    1        1      red     red
1    2        2     green   green
2    3        3      blue    blue

I know it's some sort of string formatting. I tried indexing the last letter using [-1] but couldn't quite get it to work. Thanks!

Upvotes: 6

Views: 8772

Answers (3)

sammywemmy
sammywemmy

Reputation: 28709

in addition to @David's answer, you could use pandas str endswith to exclude columns ending with '_y':

df.loc[:,~df.columns.str.endswith('_y')]

  Team  Color
0   1   red
1   2   green
2   3   blue

the ~(tilde) sign serves as negation

The abstractions with pyjanitor select_columns might be helpful:

# pip install pyjanitor
import janitor
import pandas as pd

df.select_columns('*y', invert = True)

  Team  Color
0    1    red
1    2  green
2    3   blue

Upvotes: 6

jcaliz
jcaliz

Reputation: 4021

Use a filter by regular expression

df = df[df.columns.drop(list(df.filter(regex='_y')))]

Upvotes: 3

David Erickson
David Erickson

Reputation: 16683

drop column based on a string condition

df.drop([col for col in df.columns if '_y' in col],axis=1,inplace=True)

Better yet, if it must be specific to ending with it, then:

df.drop([col for col in df.columns if col.endswith('_y')],axis=1,inplace=True)

Upvotes: 5

Related Questions