Reputation: 634
I have a list of values that I would like to use to select rows in a dataframe. The trick is I would like to select any row where the list value is in the row. Example:
index color shape
1 blue star
2 red square
3 yellow circle
My list would be
list_vals = ['sq', 'blu']
I would like to select the rows
index color shape
1 blue star
2 red square
Upvotes: 2
Views: 422
Reputation: 12406
Here is one approach
df_filtered = (
df[(df['color'].str.contains(list_vals[0])) |
(df['shape'].str.contains(list_vals[1]))
]
)
print(df_filtered)
index color shape
0 1 blue star
1 2 red square
EDIT
Another approach is based on this SO post (which contains the full explanation of this method)
DataFrame
(row) index of the search (filtered) results (this is then used to slice the original DataFrame
)def find_subtext(df, txt):
contains = df.stack().str.contains(txt).unstack()
return contains[contains.any(1)].index
df_filtered = find_subtext(df, '|'.join(list_vals))
print(df.iloc[df_filtered, :])
index color shape
0 1 blue star
1 2 red square
Upvotes: 2
Reputation: 18647
Use DataFrame.stack
to convert to a Series
, then use Series.str.contains
to find the strings your interested in - we'll use '|'.join
to create a regex 'OR' patter combining all items from list_items
.
For reference, this regex pattern looks like 'sq|blu'
in this case.
Next, Series.unstack
to get back to original shape and use DataFrame.any
over axis 1 to create the boolean index we'll use to return the desired rows.
df[df.stack().str.contains('|'.join(list_vals)).unstack().any(1)]
[out]
ndex color shape
0 1 blue star
1 2 red square
Upvotes: 3
Reputation: 75150
Or join the list with a pipe and check with str.contains()
over the df:
df[df.apply(lambda x: x.str.contains('|'.join(list_vals))).any(axis=1)]
color shape
index
1 blue star
2 red square
Upvotes: 2
Reputation: 3770
df[df['shape'].apply(lambda x: any(s in x[:len(s)] for s in list_vals))]
Output
color shape
1 red square
Upvotes: 2