Reputation: 693
I've a simple column of strings, and a list of strings.
strings_col
"the cat is on the table"
"the dog is eating"
list1 = ["cat", "table", "dog"]
I need to create another column in which every row contains the string contained in the list if they are in the string_col, if it contains two or more strings from the list, then I'd like to have more rows. The result should be something like this:
strings_col string
"the cat is on the table" cat
"the cat is on the table" table
"the dog is eating" dog
How can I do that? thanks
Upvotes: 1
Views: 59
Reputation: 15872
You can use str.findall
:
>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string')
strings_col string
0 "the cat is on the table" cat
0 "the cat is on the table" table
1 "the dog is eating" dog
If you want you can reset_index
after that:
>>> df.assign(
string=df.strings_col.str.findall(r'|'.join(list1))
).explode('string').reset_index(drop=True)
strings_col string
0 "the cat is on the table" cat
1 "the cat is on the table" table
2 "the dog is eating" dog
Upvotes: 3
Reputation: 23099
try str.extractall
, .groupby.agg(list)
& .explode()
pat = '|'.join(list1)
# 'cat|table|dog'
df['matches'] = df['strings_col']\
.str.extractall(f"({pat})")\
.groupby(level=0).agg(list)
df_new = df.explode('matches')
print(df_new)
strings_col matches
0 the cat is on the table cat
0 the cat is on the table table
1 the dog is eating dog
Upvotes: 3