Reputation: 881
I'm trying to match all the words that do not start with a hashtag using Python Regex.
Example sentence:
This is #a test for #matching #hashtags
I would like the following to be matched: This is test for
I was able to match all the words that start with a hashtag with this: #\b\w*
Then I realized I needed the opposite.
I tried many variation similar to these without success:
Nothing works.
Upvotes: 0
Views: 574
Reputation: 163362
To prevent firing the lookbehind on every position before a match, you can switch the word boundary and the lookbehind (as lookarounds can be expensive) and the lookbehind fires after asserting the word boundary.
\b(?<!#)\w+
\b
A word boundary(?<!#)
Negative lookbehind, assert not # directly to the left of the current position\w+
Match 1+ word charactersUpvotes: 0
Reputation: 942
If you want a Regex, you will need a Negative Lookbehind
(?<!#)\b\w+
https://regex101.com/r/aMdc7R/1
Upvotes: 3
Reputation: 18416
A non-regex solution should be fine:
>>> text = 'This is #a test for #matching #hashtags'
>>> [word for word in text.split(' ') if not word.startswith('#')]
['This', 'is', 'test', 'for']
For regex, you need to use something like negative lookbehind assertion, which will match only if the substring is not preceded by substring/character specified.
Upvotes: 2