Awsmike
Awsmike

Reputation: 881

Python Regex match everything except words that start with a hashtag

I'm trying to match all the words that do not start with a hashtag using Python Regex.

Example sentence:

    This is #a test for #matching #hashtags

I would like the following to be matched: This is test for

I was able to match all the words that start with a hashtag with this: #\b\w*

Then I realized I needed the opposite.

I tried many variation similar to these without success:

Nothing works.

Upvotes: 0

Views: 574

Answers (3)

The fourth bird
The fourth bird

Reputation: 163362

To prevent firing the lookbehind on every position before a match, you can switch the word boundary and the lookbehind (as lookarounds can be expensive) and the lookbehind fires after asserting the word boundary.

\b(?<!#)\w+
  • \b A word boundary
  • (?<!#) Negative lookbehind, assert not # directly to the left of the current position
  • \w+ Match 1+ word characters

Regex demo

Upvotes: 0

flaxon
flaxon

Reputation: 942

If you want a Regex, you will need a Negative Lookbehind

(?<!#)\b\w+

https://regex101.com/r/aMdc7R/1

Upvotes: 3

ThePyGuy
ThePyGuy

Reputation: 18416

A non-regex solution should be fine:

>>> text = 'This is #a test for #matching #hashtags'
>>> [word for word in text.split(' ') if not word.startswith('#')]
['This', 'is', 'test', 'for']

For regex, you need to use something like negative lookbehind assertion, which will match only if the substring is not preceded by substring/character specified.

Upvotes: 2

Related Questions