Reputation: 75
in the second group, I want to match words until it encounters a ( or > symbol. But, I still want it to match words even if it doesn't have those symbols as in strings 3 and 4. I am using Python.
Upvotes: 1
Views: 227
Reputation: 308121
When you're matching a sequence that isn't supposed to include a character, just use a character set that inverts the characters you don't want. I've simplified this as well based on your examples. The only downside is that the match will include trailing spaces.
r'.*(#\d*\,?\d+)\s+in\s+([^(>]*)'
>>> for test in tests:
print(re.findall(r'.*(#\d*\,?\d+)\s+in\s+([^(>]*)', test))
[('#26,968', 'Office Products ')]
[('#13,452', 'Industrial & Scientific ')]
[('#99,999', 'baby')]
[('#888', 'office supplies')]
Upvotes: 1
Reputation: 66
It may not be the best pattern and could catch on a lot more, but if the sample provided is a good sampling of the data, I have another pattern to suggest:
r"([#\d,]+) in ([\w\s&]+)>?([\w\s&]*)([()\w\s\d]*)"
https://regex101.com/r/hKD6AX/2
Hope this helps!
Upvotes: 0
Reputation: 106455
You can match the end of string in an alternation instead:
.*(#\d*\,?\d+)\s.*in\s(.*?)\s*(?=[(>]|$)
Demo: https://regex101.com/r/BliHlU/1
Upvotes: 2