Reputation: 3550
I want to get the words of a tweet that are not a mention (starting with @) or a hashtag (starting with #).
my code is like:
import re
pattern=r'(?u)\b\w\w+\b'
pattern=re.compile(pattern)
pattern.findall('this is a tweet #hashtag @mention')
The result with this regex is this is a tweet hashtag mention
but I don't want the hashtag and mention in the result. I want the result to be:
this is a tweet
Note that I can't use whitespace instead of \b because the output for .this is a tweet (note the . at the beginning) should also be [this, is, a, tweet] \b forces the start of a word to be any non-alphanumeric but if I use \s then this won't be in the results.
Upvotes: 0
Views: 1160
Reputation: 11473
If you are open to solutions other than regex
, then you can make use of filter
and lambda
function for desired result.
a = 'this is a tweet #hashtag @mention'
" ".join(filter(lambda x:x[0]!='#' and x[0]!='@' , a.split()))
'this is a tweet'
Upvotes: 0
Reputation: 67998
(?<![#@])\b\w+\b
You can use this.See demo.
https://regex101.com/r/KzHvuy/2
Upvotes: 1