Reputation: 451
Let's say we have a dataframe- df and a column labelled 'A'. For selecting rows that match ONE string -'some_string', df['A'].str.contains('some_string') works great.
My question is, is there a corresponding method to pass to contains a list of strings, so that partial matches can be gotten? instead of 'some_string' can I give it a list of strings? I am trying to avoid using a for loop and slicing the data frame and concatenating into a new dataframe.
Lets say the dataframe is
pd.DataFrame(np.array([['cat', 2], ['rat', 5], ['ball', 8],['string', 8]]),columns=['A', 'B']))
and
list =['at','ll','ac']
So I want to select the rows with cat, rat, ball. Sorry for the artificially contrived example.
Upvotes: 3
Views: 7831
Reputation: 83
The most simplistic and Pandas friendly is:
list_of_strings = ['string1', 'string2']
df[df['A'].isin(list_of_strings)]
From https://sparkbyexamples.com/pandas/pandas-use-a-list-of-values-to-select-rows-from-dataframe/
Upvotes: 2
Reputation: 7186
pandas.Series.str.contains
takes either a string or a regex. So you could just build a regex from the list of strings:
import pandas as pd
strings = "fo", "ba"
x = pd.Series(["foo", "bar", "baz", "buzz"])
x.str.contains("|".join(strings))
# 0 True
# 1 True
# 2 True
#3 False
# dtype: bool
This might be slow if your list of strings to match against is very long and you might need a na=False
to ignore NaN
values, as mentioned in the comments by @anky_91.
Upvotes: 5
Reputation: 1055
If A
always contains exactly the string you want to find in the list, you can do this:
df['A'].map(lambda x: 1 if x in list_of_strings else 0)
the lambda function will check, for each row, if the value in 'A'
(temporarily stored in x
exists as one of the elements in list_of_strings
, and return 1
or 0
accordingly.
You can then filter when this new mapped column is 1
Upvotes: 0