Isaiah Y
Isaiah Y

Reputation: 99

Python regex returning one group instead of two. What am I missing here?

Let's say that I have a variable that can either be in the format of:

[+] Software in use: Calculator

or, in some cases, the software version number is shown:

[+] Software in use: Calculator - v2.3

I am attempting to capture 1) the software name, and 2) if provided, the version number.

Here's what I have so far:

line = '[+] Software in use: Calculator - v2.3'
searchObj = re.search('\[\+\] Software in use: (.+)( - v(\d+.\d+))?', line)

searchObj.group(1) returns the entire "Calculator - v2.3" Why is regex not splitting them up into groups? searchObj.group(2), searchObj.group(3) does not exist. I thought that parenthesis signified a capture group. Am I overlooking something?

Upvotes: 3

Views: 122

Answers (2)

Noxfadh
Noxfadh

Reputation: 61

So the regex characters +, and * are greedy. This means they match as much as possible, before attempting future matches.

In your regex you use (.+)( - v(\d+.\d+))?. The second capture group will match as little as possible, because the ? makes it lazy (matches as little as possible). Combining this with the prior group being greedy means that the second group will never match anything.

Basically, just sticking in a ?, the following should fix it

(.+)?( - v(\d+.\d+))? 

I hope my explanation makes sense

Upvotes: 3

vks
vks

Reputation: 67968

line = '[+] Software in use: Calculator - v2.3'
searchObj = re.search(r'\[\+\] Software in use: (.+?)(?:( - v(\d+.\d+))|$)', line)

                                                  ^^

Make it non greedy.See demo.

https://regex101.com/r/eB8xU8/10

or

\[\+\] Software in use: (.+?)( - v(\d+.\d+))?\b

See demo.

https://regex101.com/r/eB8xU8/11

Upvotes: 5

Related Questions