timmyt123
timmyt123

Reputation: 75

Regex I want to match until certain characters but still be able to match strings if it doesn't have these characters

in the second group, I want to match words until it encounters a ( or > symbol. But, I still want it to match words even if it doesn't have those symbols as in strings 3 and 4. I am using Python.

regex not matching

Upvotes: 1

Views: 227

Answers (3)

Mark Ransom
Mark Ransom

Reputation: 308121

When you're matching a sequence that isn't supposed to include a character, just use a character set that inverts the characters you don't want. I've simplified this as well based on your examples. The only downside is that the match will include trailing spaces.

r'.*(#\d*\,?\d+)\s+in\s+([^(>]*)'

>>> for test in tests:
    print(re.findall(r'.*(#\d*\,?\d+)\s+in\s+([^(>]*)', test))

[('#26,968', 'Office Products ')]
[('#13,452', 'Industrial & Scientific ')]
[('#99,999', 'baby')]
[('#888', 'office supplies')]

Upvotes: 1

Joe Walker
Joe Walker

Reputation: 66

It may not be the best pattern and could catch on a lot more, but if the sample provided is a good sampling of the data, I have another pattern to suggest:

r"([#\d,]+) in ([\w\s&]+)>?([\w\s&]*)([()\w\s\d]*)"

https://regex101.com/r/hKD6AX/2

Hope this helps!

Upvotes: 0

blhsing
blhsing

Reputation: 106455

You can match the end of string in an alternation instead:

.*(#\d*\,?\d+)\s.*in\s(.*?)\s*(?=[(>]|$)

Demo: https://regex101.com/r/BliHlU/1

Upvotes: 2

Related Questions