user94628
user94628

Reputation: 3721

Using Python regex for twitter data

I'm filtering tweets in my application and want to return all tweets that either have a certain word in the text. So if I am filtering BBC and I want all instances of BBC eg. BBC, bbc, BBC1, #BBC, @bbc, how could I write the regex.

So far I'm doing:

re.compile(r'#|@[0-9]'+term, re.IGNORECASE)

Term is a list containing words and I want returned only those words in the list with the extra @ or # or 0-9 prepending or appending that word OR the word by itself.

Thanks

Upvotes: 3

Views: 523

Answers (1)

nneonneo
nneonneo

Reputation: 179402

Use the '\b' delimiter to find whole words:

re.compile(r'\b(?:#|@|)[0-9]*%s[0-9]*\b' % re.escape(term), re.IGNORECASE)

Upvotes: 2

Related Questions