Python match word to word list after removing repeating characters

Question

I have a list of words with positive and negative sentiment e.g. ['happy', 'sad']

Now when processing tweets I'm removing repeating characters like this (allowing only 2 repetitions):

happpppyyy -> happyy

saaad -> saad

The check if e.g. saad is part of the word list should now return True because it is similar to sad.

How can I implement this behaviour?

Jean-Fran&#231;ois Fabre · Accepted Answer

I would build regular expressions dynamically turning a word:

happy

into

h+a+p+p+y+

Pass a list of "happy" words to this:

import re

re_list = [re.compile("".join(["{}+".format(c) for c in x])) for x in ['happy', 'glad']]

then test it (using any to return True if any happy regex matches:

for w in ["haaappy","saad","glaad"]:
    print(w,any(re.match(x,w) for x in re_list))

result:

haaappy True
saad False
glaad True

Answers (1)