Reputation: 1439
I would like to drop every column that ends in a 'y' in my data frame. For some reason, the data I have has each column listed twice, with the only thing different being the column name, like so:
d = {'Team': ['1', '2', '3'], 'Team_y': ['1', '2', '3'], 'Color' : ['red', 'green', 'blue'], 'Color_y' : ['red', 'green', 'blue']}
df = pd.DataFrame(data=d)
df
Team Team_y Color Color_y
0 1 1 red red
1 2 2 green green
2 3 3 blue blue
I know it's some sort of string formatting. I tried indexing the last letter using [-1] but couldn't quite get it to work. Thanks!
Upvotes: 6
Views: 8772
Reputation: 28709
in addition to @David's answer, you could use pandas str endswith to exclude columns ending with '_y':
df.loc[:,~df.columns.str.endswith('_y')]
Team Color
0 1 red
1 2 green
2 3 blue
the ~(tilde) sign serves as negation
The abstractions with pyjanitor select_columns might be helpful:
# pip install pyjanitor
import janitor
import pandas as pd
df.select_columns('*y', invert = True)
Team Color
0 1 red
1 2 green
2 3 blue
Upvotes: 6
Reputation: 4021
Use a filter by regular expression
df = df[df.columns.drop(list(df.filter(regex='_y')))]
Upvotes: 3
Reputation: 16683
drop column based on a string condition
df.drop([col for col in df.columns if '_y' in col],axis=1,inplace=True)
Better yet, if it must be specific to ending with it, then:
df.drop([col for col in df.columns if col.endswith('_y')],axis=1,inplace=True)
Upvotes: 5