Reputation: 85
I have a pandas dataframe containing a list of strings in a column called contains_and
. Now I want to select the rows from that dataframe whose words in contains_and
are all contained in a given string, e.g.
example: str = "I'm really satisfied with the quality and the price of product X"
df: pd.DataFrame = pd.DataFrame({"columnA": [1,2], "contains_and": [["price","quality"],["delivery","speed"]]})
resulting in a dataframe like this:
columnA contains_and
0 1 [price, quality]
1 2 [delivery, speed]
Now, I would like to only select row 1, as example
contains all words in the list in contains_and
.
My initial instinct was to do the following:
df.loc[
all([word in example for word in df["contains_and"]])
]
However, doing that results in the following error:
TypeError: 'in <string>' requires string as left operand, not list
I'm not quite sure how to best do this, but it seems like something that shouldn't be all too difficult. Does someone know of a good way to do this?
Upvotes: 1
Views: 1158
Reputation: 620
Based on @Nk03 answer, you could also try:
df = df[df.contains_and.apply(lambda x: any([q for q in x if q in example]))]
In my opinion is more intuitive to check if words are in example, rather than the opposite, as your first attempt shows.
Upvotes: 0
Reputation: 18316
another way is explode
ing the list of candidate words and checking (per row) if they are all in the words of example
which are found with str.split
:
# a Series of words
ex = pd.Series(example.split())
# boolean array reduced with `all`
to_keep = df["contains_and"].explode().isin(ex).groupby(level=0).all()
# keep only "True" rows
new_df = df[to_keep]
to get
>>> new_df
columnA contains_and
0 1 [price, quality]
Upvotes: 1
Reputation: 14949
One way:
df = df[df.contains_and.apply(lambda x: all((i in example) for i in x), 1)]
OUTPUT:
columnA contains_and
0 1 [price, quality]
Upvotes: 1