Scheinin
Scheinin

Reputation: 195

Where is such a regex wrong?

I am using python.
The pattern is:

re.compile(r'^(.+?)-?.*?\(.+?\)')

The text like:

text1 = 'TVTP-S2(xxxx123123)'

text2 = 'TVTP(xxxx123123)'

I expect to get TVTP

Upvotes: 3

Views: 110

Answers (4)

The fourth bird
The fourth bird

Reputation: 163217

Another option to match those formats is:

^([^-()]+)(?:-[^()]*)?\([^()]*\)

Explanation

  • ^ Start of string
  • ([^-()]+) Capture group 1, match 1+ times any character other than - ( and )
  • (?:-[^()]*)? As the - is excluded from the first part, optionally match - followed by any char other than ( and )
  • \([^()]*\) Match from ( till ) without matching any parenthesis between them

Regex demo | Python demo

Example

import re

regex = r"^([^-()]+)(?:-[^()]*)?\([^()]*\)"
s = ("TVTP-S2(xxxx123123)\n"
    "TVTP(xxxx123123)\n")
    
print(re.findall(regex, s, re.MULTILINE))

Output

['TVTP', 'TVTP']

Upvotes: 3

Matt Miguel
Matt Miguel

Reputation: 1375

It is because the first plus is lazy, and the subsequent dash is optional, followed by a pattern that allows any character.

This allows the regex engine to choose the single letter T for the first group (because it is lazy), choose to interpret the dash as just not being there, which is allowed because it is followed by a question mark, and then have the next .* match "VTP-S2".

You can just grab non-dashes to capture, followed by nonparentheses up to the parentheses.

p=re.compile(r'^([^-]*?)[^(]*\(.+?\)')
p.search('TVTP-S2(xxxx123123) blah()').group(1)

The nonparentheses part prevents the second portion from matching 'S2(xxxx123123) blah(' in my modified example above.

Upvotes: 0

hwangzhiming
hwangzhiming

Reputation: 31

a quick answer will be

^(\w+)(-.*?)?\((.*?)\)$

https://regex101.com/r/wL4jKe/2/

Upvotes: 0

Jarvis
Jarvis

Reputation: 8564

This regex works:

pattern = r'^([^-]+).*\(.+?\)'
>>> re.findall(pattern, 'TVTP-S2(xxxx123123)')
['TVTP']
>>> re.findall(pattern, 'TVTP(xxxx123123)')
['TVTP']

Upvotes: 1

Related Questions