mykiwi
mykiwi

Reputation: 1693

Regex capture only the last child, not all

I would like to play a regexp on each lines:

127.0.0.1 localhost
# 127.0.0.1 fake
1.2.3.4 foo bar baz

The goal is to ignore when it starts with a #, otherwise I want to capture the ip and each strings after it.

Here is my attempt:

{^\s?(?<ip>[^#\s]+)(?:\s+(?<domain>[^\s]+))*$}

My problem is that when I play this on 1.2.3.4 foo bar baz it only capture baz, not foo and bar. I would like every domains.

PS: I'm using PHP. You can try it here: https://regex101.com/r/S8Fzlu/1

Upvotes: 2

Views: 67

Answers (2)

anubhava
anubhava

Reputation: 786359

PHP regex engine or PCRE doesn't allow dynamic capture group creation when using a group with a quantifier. It returns only the last captured string. That's the reason you're seeing baz being captured in 2nd capture group.

However you may leverage \G (kind of word boundary) and capture all strings using preg_match_all using this regex:

(?:^\h*(?<ip>(?:\d+\.){3}\d+)|(?!^)\G)\h+(?<domain>\S+)

RegEx Demo

  • \G asserts position at the end of the previous match or the start of the string for the first match

Code:

$str = '1.2.3.4 foo bar baz';
$re = '/(?:^\h*(?<ip>(?:\d+\.){3}\d+)|(?!^)\G)\h+(?<domain>\S+)/';
preg_match_all($re, $str, $m);

print_r($m['ip']);
print_r($m['domain']);

Output:

Array
(
    [0] => 1.2.3.4
    [1] =>
    [2] =>
)
Array
(
    [0] => foo
    [1] => bar
    [2] => baz
)

Upvotes: 1

Poul Bak
Poul Bak

Reputation: 10940

I'm not sure how php RegEx Works, but this RegEx Works in JavaScript and C#, give it a try:

^\s?(?<ip>[^#\s]+)(?:\s+(?<domain>[^.]+)*)$

Note I have moved the '*' outside the parantese.

Upvotes: 0

Related Questions