flowfree
flowfree

Reputation: 16462

Regular expression for checking if hashtags exist in a tweet

I want to check if both #python and #conf hashtags exist in the following tweets:

tweets = ['conferences you would like to attend #python #conf',
          'conferences you would like to attend #conf #python']

I've tried the code below but it doesn't match with the tweets.

import re
for tweet in tweets:
    if re.search(r'^(?=.*\b#python\b)(?=.*\b#conf\b).*$', tweet):
        print(tweet)

If I remove the # sign from the regex, both tweets matches but it will also match tweets with non-hashtag python and conf words.

Upvotes: 1

Views: 1178

Answers (1)

falsetru
falsetru

Reputation: 369494

\b matches at the beginning or end of a word. # is not considered as word according to the re module documentation:

\b

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string

Try following regular expression (^, .*$ are unnecessary):

(?=.*#python\b)(?=.*#conf\b)

>>> tweets = ['conferences you would like to attend #python #conf',
...           'conferences you would like to attend #conf #python',
...           'conferences you would like to attend #conf #snake']
>>>
>>> import re
>>> for tweet in tweets:
...     if re.search(r'(?=.*#python\b)(?=.*#conf\b)', tweet):
...         print(tweet)
...
conferences you would like to attend #python #conf
conferences you would like to attend #conf #python

Upvotes: 1

Related Questions