Reputation: 3550
I have a regex that returns the words (excludes @mentions includes hashtags but removes the hash sign #)
import re
pattern=r'(?u)(?<![@])\b\w\w+\b'
pattern=re.compile(pattern)
pattern.findall('this is a tweet #hashtag @mention')
This returns
['this', 'is', 'tweet', 'hashtag']
What I need is a modification to this regex that returns the hash sign with hashtag so it should return:
['this', 'is', 'tweet', '#hashtag']
Note that my question is different from returning just @mentions and #hashtags I want both regular words and hashtags but I don't want @mentions.
Upvotes: 0
Views: 1308
Reputation: 240
Adding '#?' to the pattern will let it match words that start with 0 or 1 hash symbols.
import re
pattern=r'(?u)(?<![@])#?\b\w\w+\b'
pattern=re.compile(pattern)
results = pattern.findall('this is a tweet #hashtag @mention')
print(results)
Resulting in:
['this', 'is', 'tweet', '#hashtag']
Upvotes: 2