Stefano Maglione
Stefano Maglione

Reputation: 4160

Php preg_match optional group

I wrote a regex:

(^.*)(\[{1}[0-9]+:[0-9]+:[0-9]+:[0-9]+\]{1}) (\"{1}.+\"{1}) ([0-9]+) ([0-9-]+)

to match a string like:

141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233

and using the php preg_match.

When I remove from the string for example the first part 141.243.1.172 the preg_match returns me:

array(6
 0  =>  [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233
 1  =>  // correctly empty
 2  =>  [29:23:53:25]
 3  =>  "GET /Software.html HTTP/1.0"
 4  =>  200
 5  =>  233
 )

where the index 1 is correctly empty. But if I remove from the string [29:23:53:25] I get an empty array from preg_match. How can I have the same result as above, getting just the related index empty and not all?

Upvotes: 1

Views: 809

Answers (2)

codegames
codegames

Reputation: 1921

change regex to this

((^.*)(\[{1}[0-9]+:[0-9]+:[0-9]+:[0-9]+\]{1}) )?(\"{1}.+\"{1}) ([0-9]+) ([0-9-]+)

for 141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233

result would be

Array
(
    [0] => 141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233
    [1] => 141.243.1.172 [29:23:53:25] 
    [2] => 141.243.1.172
    [3] => [29:23:53:25]
    [4] => "GET /Software.html HTTP/1.0"
    [5] => 200
    [6] => 233
)

and for [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233

result would be

Array
(
    [0] => [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233
    [1] => [29:23:53:25] 
    [2] => 
    [3] => [29:23:53:25]
    [4] => "GET /Software.html HTTP/1.0"
    [5] => 200
    [6] => 233
)

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163632

For the first part that works due to the .*. If you want to be able to remove the second part as well, you could make both groups optional and the first one non greedy. Move the space into the second group as well.

Note that you don't have to escape the double quote and that the quantifier {1} is superfluous so it can be omitted.

There is only a single double quote following after the first match, but to prevent possible over matching you could make that match also non greedy or use a negated character class ("[^"]+") instead to prevent unnecessary backtracking.

(^.*?)?(\[[0-9]+:[0-9]+:[0-9]+:[0-9]+\] )?(".+?") ([0-9]+) ([0-9-]+)

Regex demo

For example

$strings = [
    '141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233',
    '[29:23:53:25] "GET /Software.html HTTP/1.0" 200 233',
    '"GET /Software.html HTTP/1.0" 200 233'
];

$pattern = '/(^.*?)?(\[[0-9]+:[0-9]+:[0-9]+:[0-9]+\] )?(".+?") ([0-9]+) ([0-9-]+)/';

foreach ($strings as $string) {
    preg_match($pattern, $string, $matches);
    print_r($matches);
}

Result

Array
(
    [0] => 141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233
    [1] => 141.243.1.172 
    [2] => [29:23:53:25] 
    [3] => "GET /Software.html HTTP/1.0"
    [4] => 200
    [5] => 233
)
Array
(
    [0] => [29:23:53:25] "GET /Software.html HTTP/1.0" 200 233
    [1] => 
    [2] => [29:23:53:25] 
    [3] => "GET /Software.html HTTP/1.0"
    [4] => 200
    [5] => 233
)
Array
(
    [0] => "GET /Software.html HTTP/1.0" 200 233
    [1] => 
    [2] => 
    [3] => "GET /Software.html HTTP/1.0"
    [4] => 200
    [5] => 233
)

Php demo

Upvotes: 2

Related Questions