Reputation: 1690
I would like to extract/filter the rows of a dataframe that contains the strings on a list, in this case I am trying to use queries since they usually are fantastic for this job and very elegant in the code, I have tried:
my_list = ['red', 'blue', 'green', 'yellow']
df_new = df.query("`User Color` in @my_list")
I am looking for a function that works like in
(if the string is contained)
My dataframe df looks kind of like this:
name id User Color Age
Luis 876 blue, green 35
Charles 12 blue, brown 34
Luna 654 black 24
Anna 987 brown 19
Silvana 31 red, black 26
Juliet 55 red 20
And the output I expect should be:
name id User Color Age
Luis 876 blue, green 35
Charles 12 blue, brown 34
Silvana 31 red, black 26
Juliet 55 red 20
Upvotes: 2
Views: 146
Reputation: 14011
You need to split the values in each row and check if any of those values are present in your selected list.
This can be done with a map function
df_subset = df[df['User Color'].map(lambda val: any(x in my_list for x in val.split(',')))]
Since it's a string match, so depending on your requirement consider striping and lowering the split values.
Similar code to above, but descriptive:
def filter_color(val):
for x in val.split(','):
if x.lower().strip() in my_list:
return True
return False
df_subset = df[df['name'].map(filter_color)]
Upvotes: 2
Reputation: 28709
Building off @DavidErickson's solution, using the query
method::
df.query("`User Color`.str.contains('|'.join(@my_list))")
name id User Color Age
0 Luis 876 blue, green 35
1 Charles 12 blue, brown 34
4 Silvana 31 red, black 26
5 Juliet 55 red 20
Upvotes: 1
Reputation: 16683
Instead of splitting the dataframe column, you could do the inverse, which is joining the list. You could use join
with str.contains
. NOTE: this is not as robust as it will not give a direct match:
df[df['User Color'].str.contains('|'.join(my_list))]
Out[1]:
name id User Color Age
0 Luis 876 blue, green 35
1 Charles 12 blue, brown 34
4 Silvana 31 red, black 26
5 Juliet 55 red 20
Upvotes: 1