Reputation: 7713
I have a dataframe which can be generated from the code below
df = pd.DataFrame({'person_id' :[1,2,3],'date1': ['12/31/2007','11/25/2009','10/06/2005'],'date1derived':[0,0,0],'val1':[2,4,6],'date2': ['12/31/2017','11/25/2019','10/06/2015'],'date2derived':[0,0,0],'val2':[1,3,5],'date3':['12/31/2027','11/25/2029','10/06/2025'],'date3derived':[0,0,0],'val3':[7,9,11]})
The dataframe looks like as shown below
I would like to remove columns that contain "derived" in their name. I tried different regex but couldn't get the expected output.
df = df.filter(regex='[^H\dDerived]+', axis=1)
df = df.filter(regex='[^Derived]',axis=1)
Can you let me know the right regex to do this?
Upvotes: 1
Views: 2447
Reputation: 4315
In recent versions of pandas, you can use string methods on the index and columns. Here, str.endswith seems like a good fit.
import pandas as pd
df = pd.DataFrame({'person_id' :[1,2,3],'date1': ['12/31/2007','11/25/2009','10/06/2005'],
'date1derived':[0,0,0],'val1':[2,4,6],'date2': ['12/31/2017','11/25/2019','10/06/2015'],
'date2derived':[0,0,0],'val2':[1,3,5],'date3':['12/31/2027','11/25/2029','10/06/2025'],
'date3derived':[0,0,0],'val3':[7,9,11]})
df = df.loc[:,~df.columns.str.endswith('derived')]
print(df)
O/P:
person_id date1 val1 date2 val2 date3 val3
0 1 12/31/2007 2 12/31/2017 1 12/31/2027 7
1 2 11/25/2009 4 11/25/2019 3 11/25/2029 9
2 3 10/06/2005 6 10/06/2015 5 10/06/2025 11
Upvotes: 1
Reputation: 75080
pd.Index.difference()
with df.filter()
df[df.columns.difference(df.filter(like='derived').columns,sort=False)]
person_id date1 val1 date2 val2 date3 val3
0 1 12/31/2007 2 12/31/2017 1 12/31/2027 7
1 2 11/25/2009 4 11/25/2019 3 11/25/2029 9
2 3 10/06/2005 6 10/06/2015 5 10/06/2025 11
Upvotes: 1
Reputation: 25239
IIUC, you want to drop columns has derived
in it. This should do:
df.drop(df.filter(like='derived').columns, 1)
Out[455]:
person_id date1 val1 date2 val2 date3 val3
0 1 12/31/2007 2 12/31/2017 1 12/31/2027 7
1 2 11/25/2009 4 11/25/2019 3 11/25/2029 9
2 3 10/06/2005 6 10/06/2015 5 10/06/2025 11
Upvotes: 2
Reputation: 42017
You can use a zero-width negative lookahead to make sure the string derived
does not come anywhere:
^(?!.*?derived)
^
matches the start of the string(?!.*?derived)
is the negative lookahead pattern that makes sure derived
does not come in the stringYour pattern [^Derived]
will match any single character that are not one of D/e/r/i/v/e/d .
Upvotes: 2
Reputation: 3770
df[[c for c in df.columns if 'derived' not in c ]]
Output
person_id date1 val1 date2 val2 date3 val3
0 1 12/31/2007 2 12/31/2017 1 12/31/2027 7
1 2 11/25/2009 4 11/25/2019 3 11/25/2029 9
2 3 10/06/2005 6 10/06/2015 5 10/06/2025 11
Upvotes: 1