Reputation: 2790
I have a list of words with positive and negative sentiment e.g. ['happy', 'sad']
Now when processing tweets I'm removing repeating characters like this (allowing only 2 repetitions):
happpppyyy -> happyy
saaad -> saad
The check if e.g. saad
is part of the word list should now return True
because it is similar to sad
.
How can I implement this behaviour?
Upvotes: 0
Views: 67
Reputation: 140256
I would build regular expressions dynamically turning a word:
happy
into
h+a+p+p+y+
Pass a list of "happy" words to this:
import re
re_list = [re.compile("".join(["{}+".format(c) for c in x])) for x in ['happy', 'glad']]
then test it (using any
to return True
if any happy regex matches:
for w in ["haaappy","saad","glaad"]:
print(w,any(re.match(x,w) for x in re_list))
result:
haaappy True
saad False
glaad True
Upvotes: 3