Reputation: 99
Let's say that I have a variable that can either be in the format of:
[+] Software in use: Calculator
or, in some cases, the software version number is shown:
[+] Software in use: Calculator - v2.3
I am attempting to capture 1) the software name, and 2) if provided, the version number.
Here's what I have so far:
line = '[+] Software in use: Calculator - v2.3'
searchObj = re.search('\[\+\] Software in use: (.+)( - v(\d+.\d+))?', line)
searchObj.group(1)
returns the entire "Calculator - v2.3" Why is regex not splitting them up into groups? searchObj.group(2)
, searchObj.group(3)
does not exist. I thought that parenthesis signified a capture group. Am I overlooking something?
Upvotes: 3
Views: 122
Reputation: 61
So the regex characters +
, and *
are greedy. This means they match as much as possible, before attempting future matches.
In your regex you use (.+)( - v(\d+.\d+))?
. The second capture group will match as little as possible, because the ?
makes it lazy (matches as little as possible). Combining this with the prior group being greedy means that the second group will never match anything.
Basically, just sticking in a ?
, the following should fix it
(.+)?( - v(\d+.\d+))?
I hope my explanation makes sense
Upvotes: 3
Reputation: 67968
line = '[+] Software in use: Calculator - v2.3'
searchObj = re.search(r'\[\+\] Software in use: (.+?)(?:( - v(\d+.\d+))|$)', line)
^^
Make it non greedy.See demo.
https://regex101.com/r/eB8xU8/10
or
\[\+\] Software in use: (.+?)( - v(\d+.\d+))?\b
See demo.
https://regex101.com/r/eB8xU8/11
Upvotes: 5