1Up
1Up

Reputation: 1044

Matching an optional '#' does not seem to be working properly

I'm attempting to get full words or hashtags from a string, it seems as though I'm applying the 'optional character' ? flag wrong in regex.

Here is my code:

print re.findall(r'(#)?\w*', text)

print re.findall(r'[#]?\w*', text)

Thus 'this is a sentence talking about this, #this, #that, #etc'

Should return matches for 'this' and '#this'

Yet it seems to be returning a list with empty strings as well as other random things.

What is wrong with the regex?

EDIT:

I'm attempting to get whole spam words, and I seem to have jumbled myself...

s = 'spamword'
print re.findall(r'(#)?'+s, text)

I need to match the whole word, and not word parts...

Upvotes: 1

Views: 42

Answers (2)

user4012377
user4012377

Reputation: 28

The above answers really explains why,Here is one piece of code that should work.

>>>re.findall(r'#?\w+\b')

Upvotes: 0

anubhava
anubhava

Reputation: 785246

You can use word boundary in your regex:

s = 'spamword'
re.findall(r'#?' + s + r'\b', text)

Upvotes: 1

Related Questions