Reputation: 4301
I want to use fuzzy matching to check if dataframe contain keywords.
However, it is very slow to use apply
.
Are there any faster methods?
Can we use str
or re
?
import regex
result = df['sentence'].apply(lambda x: regex.compile('(keyword){e<4}').findall(x)) #slow
Thank you very much.
Upvotes: 2
Views: 460
Reputation: 402483
Why're you compiling inside the apply? That literally defeats its purpose. Also, the best way to speed up an apply
call is to not use apply
.
Without context to what you're actually trying to match, I present to you:
p = regex.compile('(keyword){e<4}')
result = [p.findall(x) for x in df['sentence']]
My tests show that a list comprehension based regex match supersedes str
methods in terms of performance. Well, take that with a grain of salt, because it always depends on your data and what you're trying to match.
You may want to consider using re.search
instead of findall if you just want a single match (for more performance).
Upvotes: 2