Federicofkt
Federicofkt

Reputation: 693

get string from list if it's contained in another string column

I've a simple column of strings, and a list of strings.

strings_col
"the cat is on the table"
"the dog is eating"

list1 = ["cat", "table", "dog"]

I need to create another column in which every row contains the string contained in the list if they are in the string_col, if it contains two or more strings from the list, then I'd like to have more rows. The result should be something like this:

 strings_col                   string
"the cat is on the table"      cat
"the cat is on the table"      table
"the dog is eating"            dog

How can I do that? thanks

Upvotes: 1

Views: 59

Answers (2)

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

You can use str.findall:

>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string')

                 strings_col string
0  "the cat is on the table"    cat
0  "the cat is on the table"  table
1        "the dog is eating"    dog

If you want you can reset_index after that:

>>> df.assign(
        string=df.strings_col.str.findall(r'|'.join(list1))
    ).explode('string').reset_index(drop=True)
                 strings_col string
0  "the cat is on the table"    cat
1  "the cat is on the table"  table
2        "the dog is eating"    dog

Upvotes: 3

Umar.H
Umar.H

Reputation: 23099

try str.extractall, .groupby.agg(list) & .explode()

pat = '|'.join(list1)
# 'cat|table|dog'


df['matches'] = df['strings_col']\
                 .str.extractall(f"({pat})")\
                 .groupby(level=0).agg(list)

df_new = df.explode('matches')
print(df_new)
    
               strings_col matches
0  the cat is on the table     cat
0  the cat is on the table   table
1       the dog is eating      dog

Upvotes: 3

Related Questions