mimipc
mimipc

Reputation: 1374

How to avoid different capture group numbers in a regex?

I'm trying to capture an IP address in a log and revert on a hostname if the address is 0.0.0.0.

Here are some examples of logs:

Foo bar ip=0.0.0.0 baz host=YOLO-PC foobar bazinga

In this case, I want "YOLO-PC" because IP is 0.0.0.0

Foo bar ip=12.23.34.45 baz host=FOOBAR-PC foobar bazinga

In this case, I want 12.23.34.45.

Here's what I tried:

ip=(?:0\.0\.0\.0|(\d+\.\d+\.\d+\.\d+)).*?host=(?(1).|(\S+))

It works, but when IP is 0.0.0.0, it creates a second group and the program behind it can't fetch group #2, only group #1.

How can I do this? Put it all in only one group? Is there a better solution?

Upvotes: 1

Views: 97

Answers (3)

asontu
asontu

Reputation: 4659

It's unclear from your question which environment/language/regex flavour you're dealing with. But PCRE regexes actually let you do this with the (?|some(capture)|another(capture)) syntax:

ip=(?|0\.0\.0\.0.*?host=(\S+)|(\d+\.\d+\.\d+\.\d+))

Regular expression visualization

You can see from the debuggex visualisation that both groups are numbered 1. And on regex101 you see the captures on the right.

Alternatively (if you're not using PCRE), I guess you could do this. It's less strict, but works in most every engine. You're current regex isn't particularly strict with the IP format (allowing numbers higher than 255, etc) so maybe this is not an issue for you.

ip=(?:0\.0\.0\.0.*?host=)?(\S+)

Regular expression visualization

Debuggex Demo

Upvotes: 3

Bohemian
Bohemian

Reputation: 425033

Use an alternation, which attempts left-to- right:

(?<=ip)(?!0.0.0.0)\S+|(?<=host=)\S+

See demo

This matches only your target input due to using look arounds. A negative look ahead decided not to use the ip if it's all zero.

Just pick only the first match.

Upvotes: 0

Andrei Vajna II
Andrei Vajna II

Reputation: 4842

The number of groups on your result is equal to the number of ( ) groups in the regex. And the order you reference them is the order the opening parens appear in the regex. Some of the groups might not match and be empty.

So in your case, you will always have two groups. Group 1 is the non-zero ip and group 2 is the host-name. If the IP is 0.0.0.0, then group 1 will be empty. If not, then group 2 will be empty.

Can't you just check in your code which group is empty and use the other one?

Upvotes: 1

Related Questions