The Dan
The Dan

Reputation: 1690

Use dataframe.query to select values from a list contained in a pd.DataFrame

I would like to extract/filter the rows of a dataframe that contains the strings on a list, in this case I am trying to use queries since they usually are fantastic for this job and very elegant in the code, I have tried:

my_list = ['red', 'blue', 'green', 'yellow']

df_new = df.query("`User Color` in @my_list")

I am looking for a function that works like in (if the string is contained)

My dataframe df looks kind of like this:

name      id    User Color    Age 
Luis      876   blue, green   35
Charles   12    blue, brown   34
Luna      654   black         24
Anna      987   brown         19
Silvana   31    red, black    26
Juliet    55    red           20

And the output I expect should be:

name      id    User Color    Age 
Luis      876   blue, green   35
Charles   12    blue, brown   34
Silvana   31    red, black    26
Juliet    55    red           20

Upvotes: 2

Views: 146

Answers (3)

Vikash Singh
Vikash Singh

Reputation: 14011

You need to split the values in each row and check if any of those values are present in your selected list.

This can be done with a map function

df_subset = df[df['User Color'].map(lambda val: any(x in my_list for x in val.split(',')))]

Since it's a string match, so depending on your requirement consider striping and lowering the split values.

Similar code to above, but descriptive:

def filter_color(val):
  for x in val.split(','):
    if x.lower().strip() in my_list:
      return True
  return False

df_subset = df[df['name'].map(filter_color)]

Upvotes: 2

sammywemmy
sammywemmy

Reputation: 28709

Building off @DavidErickson's solution, using the query method::

df.query("`User Color`.str.contains('|'.join(@my_list))")

    name    id  User Color  Age
0   Luis    876 blue, green 35
1   Charles 12  blue, brown 34
4   Silvana 31  red, black  26
5   Juliet  55  red         20

Upvotes: 1

David Erickson
David Erickson

Reputation: 16683

Instead of splitting the dataframe column, you could do the inverse, which is joining the list. You could use join with str.contains. NOTE: this is not as robust as it will not give a direct match:

df[df['User Color'].str.contains('|'.join(my_list))]

Out[1]: 
      name   id   User Color  Age
0     Luis  876  blue, green   35
1  Charles   12  blue, brown   34
4  Silvana   31   red, black   26
5   Juliet   55          red   20

Upvotes: 1

Related Questions