Reputation: 293
My list contains some words like : [‘orange’, ‘cool’, ‘app’....]
and I want to output all these exact whole words (if available) from a description column in a DataFrame.
I have also attached a sample picture with code. I used str.findall()
The picture shows, it extracts add
from additional
, app
from apple
. However, I do not want that. It should only output if it matches the whole word.
Upvotes: 1
Views: 533
Reputation: 626806
You can fix the code using
df['exactmatch'] = df['text'].str.findall(fr"\b({'|'.join(list1)})\b").str.join(", ")
Or, if there can be special chars in your list1
words,
df['exactmatch'] = df['text'].str.findall(fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)").str.join(", ")
The pattern created by fr"\b({'|'.join(list1)})\b"
and fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)"
will look like
\b(orange|cool|app)\b
(?<!\w)(orange|cool|app)(?!\w)
See the regex demo. Note .str.join(", ")
is considered faster than .apply(", ".join)
.
Upvotes: 1