Filter Pandas Dataframe based on List of substrings

Question

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new subset with the result.

substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings

The only approach I found was creating a List of the corresponding column an then do a list comprehension, but then I loose the other columns. Can I use list comprehension as part of e.g. df.str.contains()?

year  type     value   price
2000  ty-A     500     10000
2002  ty-Q     200     84600
2003  ty-R     500     56000
2003  ty-B     500     18000
2006  ty-C     500     12500
2012  ty-A     500     65000
2018  ty-F     500     86000
2019  ty-D     500     51900

expected output:

year  type     value   price
2000  ty-A     500     10000
2006  ty-C     500     12500
2012  ty-A     500     65000
2019  ty-D     500     51900

Hryhorii Pavlenko · Accepted Answer

You could use pandas.Series.isin

>>> df.loc[df['type'].isin(substr)]
   year type  value  price
0  2000    A    500  10000
4  2006    C    500  12500
5  2012    A    500  65000
7  2019    D    500  51900

Filter Pandas Dataframe based on List of substrings

Answers (2)

Related Questions