Reputation: 16462
I want to check if both #python
and #conf
hashtags exist in the following tweets:
tweets = ['conferences you would like to attend #python #conf',
'conferences you would like to attend #conf #python']
I've tried the code below but it doesn't match with the tweets.
import re
for tweet in tweets:
if re.search(r'^(?=.*\b#python\b)(?=.*\b#conf\b).*$', tweet):
print(tweet)
If I remove the #
sign from the regex, both tweets matches but it will also match tweets with non-hashtag python
and conf
words.
Upvotes: 1
Views: 1178
Reputation: 369494
\b
matches at the beginning or end of a word. #
is not considered as word according to the re
module documentation:
\b
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string
Try following regular expression (^
, .*$
are unnecessary):
(?=.*#python\b)(?=.*#conf\b)
>>> tweets = ['conferences you would like to attend #python #conf',
... 'conferences you would like to attend #conf #python',
... 'conferences you would like to attend #conf #snake']
>>>
>>> import re
>>> for tweet in tweets:
... if re.search(r'(?=.*#python\b)(?=.*#conf\b)', tweet):
... print(tweet)
...
conferences you would like to attend #python #conf
conferences you would like to attend #conf #python
Upvotes: 1