Reputation: 1162
I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new subset with the result.
substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings
The only approach I found was creating a List of the corresponding column an then do a list comprehension, but then I loose the other columns. Can I use list comprehension as part of e.g. df.str.contains()
?
year type value price
2000 ty-A 500 10000
2002 ty-Q 200 84600
2003 ty-R 500 56000
2003 ty-B 500 18000
2006 ty-C 500 12500
2012 ty-A 500 65000
2018 ty-F 500 86000
2019 ty-D 500 51900
expected output:
year type value price
2000 ty-A 500 10000
2006 ty-C 500 12500
2012 ty-A 500 65000
2019 ty-D 500 51900
Upvotes: 2
Views: 4568
Reputation: 123
you could use pandas.DataFrame.any or pandas.DataFrame.all
if you want where all instances match
df.loc[df['type'].apply(lambda x: all( word in x for word in substr)
or if you want any from the substr
df.loc[df['type'].apply(lambda x: any( word in x for word in substr)
That should if you print or return df a filtered list.
Upvotes: 1
Reputation: 3910
You could use pandas.Series.isin
>>> df.loc[df['type'].isin(substr)]
year type value price
0 2000 A 500 10000
4 2006 C 500 12500
5 2012 A 500 65000
7 2019 D 500 51900
Upvotes: 3