F21
F21

Reputation: 33391

"Optional" substring matching with regex

I am writing a regular expression in PHP that will need to extract data from strings that look like:

Naujasis Salemas, Šiaurės Dakota
Jungtinės Valstijos (Centras, Šiaurės Dakota)

I would like to extract:

Naujasis Salemas
Centras

For the first case, I have written [^-]*(?=,), which works quite well. I would like to modify the expression so that if there are parenthesis ( and ) , it should search between those parenthesis and then extract everything before the comma.

Is it possible to do something like this with just 1 expression? If so, how can I make it search within parenthesis if they exist?

Upvotes: 0

Views: 410

Answers (3)

user212218
user212218

Reputation:

A conditional might help you here:

$stra = 'Naujasis Salemas, Šiaurės Dakota';
$strb = 'Jungtinės Valstijos (Centras, Šiaurės Dakota)';

$regex = '
  /^                    # Anchor at start of string.
    (?(?=.*\(.+,.*\))   # Condition to check for: presence of text in parenthesis.
        .*\(([^,]+)     # If condition matches, match inside parenthesis to first comma.
      | ([^,]+)         # Else match start of string to first comma.
    )
  /x
';
preg_match($regex, $stra, $matches) and print_r($matches);

/*
Array
(
    [0] => Naujasis Salemas
    [1] => 
    [2] => Naujasis Salemas
)
*/

preg_match($regex, $strb, $matches) and print_r($matches);

/*
Array
(
    [0] => Jungtinės Valstijos (Centras
    [1] => Centras
)
*/

Note that the index in $matches changes slightly above, but you might be able to work around that using named subpatterns.

Upvotes: 2

Tim Pietzcker
Tim Pietzcker

Reputation: 336128

You could use

[^(),]+(?=,)

That would match any text except commas or parentheses, followed by a comma.

Upvotes: 1

Arnaud Le Blanc
Arnaud Le Blanc

Reputation: 99909

I think this one could do it:

[^-(]+(?=,)

This is the same regex as your, but it doesn't allow a parenthesis in the matched string. It will still match on the first subject, and on the second it will match just after the opening parenthesis.

Try it here: http://ideone.com/Crhzz

Upvotes: 1

Related Questions