Reputation: 195
I am using python.
The pattern is:
re.compile(r'^(.+?)-?.*?\(.+?\)')
The text like:
text1 = 'TVTP-S2(xxxx123123)'
text2 = 'TVTP(xxxx123123)'
I expect to get TVTP
Upvotes: 3
Views: 110
Reputation: 163217
Another option to match those formats is:
^([^-()]+)(?:-[^()]*)?\([^()]*\)
Explanation
^
Start of string([^-()]+)
Capture group 1, match 1+ times any character other than -
(
and )
(?:-[^()]*)?
As the -
is excluded from the first part, optionally match -
followed by any char other than (
and )
\([^()]*\)
Match from (
till )
without matching any parenthesis between themExample
import re
regex = r"^([^-()]+)(?:-[^()]*)?\([^()]*\)"
s = ("TVTP-S2(xxxx123123)\n"
"TVTP(xxxx123123)\n")
print(re.findall(regex, s, re.MULTILINE))
Output
['TVTP', 'TVTP']
Upvotes: 3
Reputation: 1375
It is because the first plus is lazy, and the subsequent dash is optional, followed by a pattern that allows any character.
This allows the regex engine to choose the single letter T for the first group (because it is lazy), choose to interpret the dash as just not being there, which is allowed because it is followed by a question mark, and then have the next .* match "VTP-S2".
You can just grab non-dashes to capture, followed by nonparentheses up to the parentheses.
p=re.compile(r'^([^-]*?)[^(]*\(.+?\)')
p.search('TVTP-S2(xxxx123123) blah()').group(1)
The nonparentheses part prevents the second portion from matching 'S2(xxxx123123) blah(' in my modified example above.
Upvotes: 0
Reputation: 31
a quick answer will be
^(\w+)(-.*?)?\((.*?)\)$
https://regex101.com/r/wL4jKe/2/
Upvotes: 0
Reputation: 8564
This regex works:
pattern = r'^([^-]+).*\(.+?\)'
>>> re.findall(pattern, 'TVTP-S2(xxxx123123)')
['TVTP']
>>> re.findall(pattern, 'TVTP(xxxx123123)')
['TVTP']
Upvotes: 1