Reputation: 1044
I'm attempting to get full words or hashtags from a string, it seems as though I'm applying the 'optional character' ? flag wrong in regex.
Here is my code:
print re.findall(r'(#)?\w*', text)
print re.findall(r'[#]?\w*', text)
Thus 'this is a sentence talking about this, #this, #that, #etc'
Should return matches for 'this' and '#this'
Yet it seems to be returning a list with empty strings as well as other random things.
What is wrong with the regex?
EDIT:
I'm attempting to get whole spam words, and I seem to have jumbled myself...
s = 'spamword'
print re.findall(r'(#)?'+s, text)
I need to match the whole word, and not word parts...
Upvotes: 1
Views: 42
Reputation: 28
The above answers really explains why,Here is one piece of code that should work.
>>>re.findall(r'#?\w+\b')
Upvotes: 0
Reputation: 785246
You can use word boundary in your regex:
s = 'spamword'
re.findall(r'#?' + s + r'\b', text)
Upvotes: 1