Bernhard
Bernhard

Reputation: 85

Checking string for words in list stored in Pandas Dataframe

I have a pandas dataframe containing a list of strings in a column called contains_and. Now I want to select the rows from that dataframe whose words in contains_and are all contained in a given string, e.g.

example: str = "I'm really satisfied with the quality and the price of product X"

df: pd.DataFrame = pd.DataFrame({"columnA": [1,2], "contains_and": [["price","quality"],["delivery","speed"]]})

resulting in a dataframe like this:

   columnA       contains_and
0        1   [price, quality]
1        2  [delivery, speed]

Now, I would like to only select row 1, as example contains all words in the list in contains_and.

My initial instinct was to do the following:

df.loc[
    all([word in example for word in df["contains_and"]])
    ]

However, doing that results in the following error:

TypeError: 'in <string>' requires string as left operand, not list

I'm not quite sure how to best do this, but it seems like something that shouldn't be all too difficult. Does someone know of a good way to do this?

Upvotes: 1

Views: 1158

Answers (3)

Ivan Calderon
Ivan Calderon

Reputation: 620

Based on @Nk03 answer, you could also try:

df = df[df.contains_and.apply(lambda x: any([q for q in x if q in example]))]

In my opinion is more intuitive to check if words are in example, rather than the opposite, as your first attempt shows.

Upvotes: 0

Mustafa Aydın
Mustafa Aydın

Reputation: 18316

another way is explodeing the list of candidate words and checking (per row) if they are all in the words of example which are found with str.split:

# a Series of words
ex = pd.Series(example.split())

# boolean array reduced with `all`
to_keep = df["contains_and"].explode().isin(ex).groupby(level=0).all()

# keep only "True" rows
new_df = df[to_keep]

to get

>>> new_df

   columnA      contains_and
0        1  [price, quality]

Upvotes: 1

Nk03
Nk03

Reputation: 14949

One way:

df = df[df.contains_and.apply(lambda x: all((i in example) for i in x), 1)]

OUTPUT:

   columnA      contains_and
0        1  [price, quality]

Upvotes: 1

Related Questions