Reputation: 1001
given the subject
AB: CD:DEF: HIJ99:message packet - no capture
I have crafted the following regex to capture correctly the 2-5 character targets which are all followed by a colon.
/\s{0,1}([0-9a-zA-Z]{2,5}):\s{0,1}/
which returns my matches even if erronious spaces are added before or after the targets
[0] => AB
[1] => CD
[2] => DEF
[3] => HIJ99
However, if the message packet contains a colon in it anywhere, for example
AB: CD:DEF: HIJ99:message packet no capture **or: this either**
it of course includes [4] => or
in the resulting set, which is not desired. I want to limit the matches to a consecutive set from the beginning, then once we lose concurrency, stop looking for more matches in the remainder
Edit 1:
Also tried ^(\s{0,1}([0-9a-zA-Z]{2,5}):\s{0,1}){1,5}
to force checking from the beginning of the string for multiple matches, but then I lose the individual matches
[0] => Array
(
[0] => AB: CD:DEF: HIJ99:
)
[1] => Array
(
[0] => HIJ99:
)
[2] => Array
(
[0] => HIJ99
)
Edit 2: keep in mind the subject is not fixed.
AB: CD:DEF: HIJ99:message packet - no capture
could just as easily be
ZY:xw:VU:message packet no capture or: this either
for the matches we are trying to pull, with the subject being variable as well. Just trying to filter out the chance of matching a ":" in the message packet
Upvotes: 2
Views: 71
Reputation: 174756
You could use \G
to do a consecutive string match.
$str = 'AB: CD:DEF: HIJ99:message packet no capture or: this either';
preg_match_all('/\G\s*([0-9a-zA-Z]{2,5}):\s*/', $str, $m);
print_r($m[1]);
Output:
Array
(
[0] => AB
[1] => CD
[2] => DEF
[3] => HIJ99
)
Upvotes: 1
Reputation: 91488
How about:
$str = 'AB: CD:DEF: HIJ99:message packet no capture or: this either';
preg_match_all('/(?<![^:]{7})([0-9a-zA-Z]{2,5}):/', $str, $m);
print_r($m);
Output:
Array
(
[0] => Array
(
[0] => AB:
[1] => CD:
[2] => DEF:
[3] => HIJ99:
)
[1] => Array
(
[0] => AB
[1] => CD
[2] => DEF
[3] => HIJ99
)
)
Upvotes: 1