Ash
Ash

Reputation: 3550

Python regex to get all the words in a tweet that are not @mention or #hashtag

I want to get the words of a tweet that are not a mention (starting with @) or a hashtag (starting with #).

my code is like:

import re
pattern=r'(?u)\b\w\w+\b'
pattern=re.compile(pattern)
pattern.findall('this is a tweet #hashtag @mention')

The result with this regex is this is a tweet hashtag mention

but I don't want the hashtag and mention in the result. I want the result to be:

this is a tweet

Note that I can't use whitespace instead of \b because the output for .this is a tweet (note the . at the beginning) should also be [this, is, a, tweet] \b forces the start of a word to be any non-alphanumeric but if I use \s then this won't be in the results.

Upvotes: 0

Views: 1160

Answers (2)

Anil_M
Anil_M

Reputation: 11473

If you are open to solutions other than regex, then you can make use of filter and lambda function for desired result.

a = 'this is a tweet #hashtag @mention'
" ".join(filter(lambda x:x[0]!='#' and x[0]!='@' , a.split()))

'this is a tweet'

Upvotes: 0

vks
vks

Reputation: 67998

(?<![#@])\b\w+\b

You can use this.See demo.

https://regex101.com/r/KzHvuy/2

Upvotes: 1

Related Questions