magladde
magladde

Reputation: 634

Selecting rows from a dataframe using a list of values

I have a list of values that I would like to use to select rows in a dataframe. The trick is I would like to select any row where the list value is in the row. Example:

index    color    shape
 1       blue     star
 2       red      square
 3       yellow   circle

My list would be

list_vals = ['sq', 'blu']

I would like to select the rows

index    color   shape
1        blue    star
2        red     square

Upvotes: 2

Views: 422

Answers (4)

edesz
edesz

Reputation: 12406

Here is one approach

df_filtered = (
    df[(df['color'].str.contains(list_vals[0])) |
        (df['shape'].str.contains(list_vals[1]))
        ]
                )

print(df_filtered)
   index color   shape
0      1  blue    star
1      2   red  square

EDIT

Another approach is based on this SO post (which contains the full explanation of this method)

  • the only changes I made were (1) to join your search list into a single search string and (2) to return the DataFrame (row) index of the search (filtered) results (this is then used to slice the original DataFrame)
def find_subtext(df, txt):
    contains = df.stack().str.contains(txt).unstack()
    return contains[contains.any(1)].index
df_filtered = find_subtext(df, '|'.join(list_vals))

print(df.iloc[df_filtered, :])
   index color   shape
0      1  blue    star
1      2   red  square

Upvotes: 2

Chris Adams
Chris Adams

Reputation: 18647

Use DataFrame.stack to convert to a Series, then use Series.str.contains to find the strings your interested in - we'll use '|'.join to create a regex 'OR' patter combining all items from list_items.

For reference, this regex pattern looks like 'sq|blu' in this case.

Next, Series.unstack to get back to original shape and use DataFrame.any over axis 1 to create the boolean index we'll use to return the desired rows.

df[df.stack().str.contains('|'.join(list_vals)).unstack().any(1)]

[out]

   ndex color   shape
0     1  blue    star
1     2   red  square

Upvotes: 3

anky
anky

Reputation: 75150

Or join the list with a pipe and check with str.contains() over the df:

df[df.apply(lambda x: x.str.contains('|'.join(list_vals))).any(axis=1)]

       color   shape
index              
1      blue    star
2       red  square

Upvotes: 2

iamklaus
iamklaus

Reputation: 3770

df[df['shape'].apply(lambda x: any(s in x[:len(s)] for s in list_vals))]

Output

  color   shape
1   red  square

Upvotes: 2

Related Questions