Reputation: 44305
I would like to create a way of matching strings like
abc(xyz)
abc
abc(xyz)[123]
where each bracket is an optional unit. What I would like to have, optimally, is something like
preg_match_all('complicated regex', $mystring, $matches);
with $matches
returning the following:
$mystring= abc(xyz)[123]R
gives $matches=array(0 => "abc", 1=> "xyz", 2=> "123", 3=> "R")
$mystring= abc(xyz)R
gives $matches=array(0 => "abc", 1=> "xyz", 2=> "", 3=> "R")
$mystring= abc[123]R
gives $matches=array(0 => "abc", 1=> "", 2=> "123", 3=> "R")
$mystring= abc(xyz)[123]
gives $matches=array(0 => "abc", 1=> "xyz", 2=> "123", 3=> "")
$mystring= abc
gives $matches=array(0 => "abc", 1=> "", 2=> "", 3=> "")
I hope you get the point. I tried as follows:
preg_match_all("/([a-z]*)(\([a-zA-Z]\))?(\[\w\])?/", "foo(dd)[sdfgh]", $matches)
for which matches[0]
is
Array
(
[0] => foo
[1] =>
[2] => dd
[3] =>
[4] =>
[5] => sdfgh
[6] =>
[7] =>
)
why do I get the additional empty matches? How to avoid them to have results as I need to (either in matches
or in matches[0]
...).
Upvotes: 5
Views: 266
Reputation: 6764
You get so many results because your match starts on and on again 8 times. All The string (including empty strings) are matched against the first, non-optinal part of the regex: ([a-z]*)
.
The corrected regex:
preg_match_all("/^([a-z]*)(\([a-zA-Z]*\))?(\[\w*\])?$/", "foo(ddd)[sdfgh]", $matches);
EDIT (to exclude brackets in the second part of the subject)
So we want 'ddd'
instead of '(ddd)'
:
This regex uses a "non capturing pattern" (?: ... )
in order to mark an optional part of the subject, but not to capture it in the matches array.
preg_match_all("/^([a-z]*)(?:\(([a-zA-Z]*)\))?(\[\w*\])?$/", "foo(ddd)[sdfgh]", $matches);
The interesting part is this: (?:\(([a-zA-Z]*)\))?
.
(?:
marks the beginning of a non capturing subpattern\(
is an escaped literal paren(
mark the beginning of standard capturing subpatternOnly contents of the third parens pair will show up in the $matches array.
Upvotes: 1
Reputation: 173542
The optional last letter throws off the results a bit, but this expression will cover it:
function doit($s)
{
echo "==== $s ====\n";
preg_match_all('/(\w+) # first word
(?: \(([^)]+)\) )? # match optional (xyz)
(?: \[([^]]+)\])? # match optional [123]
(\w?) # match optional last char
/x', $s, $matches, PREG_SET_ORDER);
print_r($matches);
}
doit('abc(xyz)[123]R xyz(123)');
doit('abc(xyz)R');
doit('abc[123]R');
doit('abc(xyz)[123]');
Results
==== abc(xyz)[123]R xyz(123) ====
Array
(
[0] => Array
(
[0] => abc(xyz)[123]R
[1] => abc
[2] => xyz
[3] => 123
[4] => R
)
[1] => Array
(
[0] => xyz(123)
[1] => xyz
[2] => 123
[3] =>
[4] =>
)
)
==== abc(xyz)R ====
Array
(
[0] => Array
(
[0] => abc(xyz)R
[1] => abc
[2] => xyz
[3] =>
[4] => R
)
)
==== abc[123]R ====
Array
(
[0] => Array
(
[0] => abc[123]R
[1] => abc
[2] =>
[3] => 123
[4] => R
)
)
==== abc(xyz)[123] ====
Array
(
[0] => Array
(
[0] => abc(xyz)[123]
[1] => abc
[2] => xyz
[3] => 123
[4] =>
)
)
Upvotes: 0
Reputation: 91385
how about:
/^(\w*)(?:\((\w*)\))?(?:\[(\w*)\])(\w*)?$/
usage:
preg_match_all("/^(\w*)(?:\((\w*)\))?(?:\[(\w*)\])(\w*)?$/", "abc[123]R", $matches);
print_r($matches);
output:
Array
(
[0] => Array
(
[0] => abc[123]R
)
[1] => Array
(
[0] => abc
)
[2] => Array
(
[0] =>
)
[3] => Array
(
[0] => 123
)
[4] => Array
(
[0] => R
)
)
explanation:
The regular expression:
(?-imsx:^(\w*)(?:\((\w*)\))?(?:\[(\w*)\])(\w*)?$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
( group and capture to \4 (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
)? end of \4 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \4)
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Upvotes: 1
Reputation: 14921
And why don't you use preg_split() ?
$string = 'abc(xyz)[123]';
$array = preg_split('/\]?\(|\)\[?|\[|\]/', $string);
print_r($array);
Upvotes: 0
Reputation: 89547
A way to obtain what you need without empty items:
$pattern = '~(?|\[(\w*+)]|\(([a-zA-Z]*+)\)|\b([a-z]*+)\b)~';
preg_match_all($pattern, 'foo(dd)[sdfgh]', $matches);
print_r($matches[1]);
Notice : this can match empty strings in brackets, to avoid them, replace * by +
Upvotes: 0
Reputation: 7706
Try using this simple regex:
[a-zA-Z0-9]+
When using preg_match_all
it will find all substrings which match the given pattern and separate them in groups if there are braces, brackets or other characters between them.
preg_match_all("/[a-zA-Z0-9]+/", "foo(dd)[sdfgh]", $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => foo
[1] => dd
[2] => sdfgh
)
)
If for some reason you need to have the braces and brackets separately you could use grouping like this:
([\(\)\[\]])?([a-zA-Z0-9]+)([\(\)\[\]])?
preg_match_all("/([\(\)\[\]])?([a-zA-Z0-9]+)([\(\)\[\]])?/", "foo(dd)[sdfgh]", $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => foo(
[1] => dd)
[2] => [sdfgh]
)
[1] => Array
(
[0] =>
[1] =>
[2] => [
)
[2] => Array
(
[0] => foo
[1] => dd
[2] => sdfgh
)
[3] => Array
(
[0] => (
[1] => )
[2] => ]
)
)
Upvotes: 0