Reputation: 5066
I'm trying to parse strings by a regex in PHP that always have this format:
FooBar(,[0-9]{7}[0-9A-F]{8})+
or in other words, they have a start value/word followed by 1 or multiple entries, each entry is one comma (,
), followed by 7 digits and 8 hexdec characters (digits or uppercase characters A to F).
My Regex to capture this is /^C7(,[0-9]{7}[0-9A-F]{8})+$/
which kind of works. When used in a preg_match_all
it returns an array with two entries, the first as expected the input string, however, in the second array there's only one entry, the last matched chunk. (see Example)
I need to capure all the chunks matched by the capturing group. I did some research and found this answer, which seamed to be about the same issue: https://stackoverflow.com/a/2205009/2989952, So I've adjusted my regex to /(,[0-9]{7}[0-9A-F]{8})+$/
, but I still only get one match. This can be tested at regex101.com. I then experimented some more, and found, that if I change the input string, to contain a space (or any not matched character for that matter), between the chunks, like this: C7,22801422CFE0F63 ,2280141C5EF0F63 ,22801402EFD0F63 ,2280138C5ED0F63 ,228024329897530 ,228023829877530
and adjust the regex once again to /(,[0-9]{7}[0-9A-F]{8})+/
it does exactly as it is intended to do!
Question: Is there a way to achieve this, matching all the chunks in this recurring group without adding whitespaces in between? If so, how?
To illustrate the problem:
No Whitespace https://regex101.com/r/ilkZjD/1
Whitespace/random chars https://regex101.com/r/mimBgz/1
Goal: Behaviour of second one, the one with whitespaces, but not adding the whitespaces (respectively the not matched characters).
I kind of found a solution, considering this https://stackoverflow.com/a/3513858/2989952 Answer. The Regex /(?:,)([0-9]{7}[0-9A-F]{8})/
works for me. https://regex101.com/r/LEEFzv/1.However I'd still like a way, to match the initial FooBar
. as that indicates the incoming string should be matched with this regex at all.
(I know I could simply check the string in a second regex for this, I however would love to have it in one regex)
Example:
Input: 'C7,22801422CFE0F63,2280141C5EF0F63,22801402EFD0F63,2280138C5ED0F63,228024329897530,228023829877530'
Upvotes: 0
Views: 197
Reputation: 129
ehmmm... maybe i can't understand the problem but your regex will work for the first scenario removing the trailing +
(,[0-9]{7}[0-9A-F]{8})
Upvotes: 1
Reputation: 89547
You can build a pattern to get contiguous matches using the A flag (that means Anchored). The main interest is that you can extract your values and check the format of the line at the same time using a lookahead:
$pattern = '~
(?!^) # fails at the start of the string
( \h*,\h* (?<value>[0-9]{7}[A-F0-9]{8}) )
# the first capture group is useful to shorten the
# the lookahead in the second branch.
|
(?<first>[a-zA-Z0-9]+)(?=(?1)*$)
~xA';
if ( preg_match_all($pattern, $yourstring, $matches) ) {
echo $matches['first'][0], PHP_EOL;
print_r(array_values(array_filter($matches['value'])));
}
The A flag forces each match to start at the beginning of the string or at the end of the previous match.
The first branch describes a comma separated value and the second branch the start of the line.
The lookahead (?=(?1)*$)
checks forward the structure of the line. If this one fails, no match is possible.
Upvotes: 0
Reputation: 163217
To capture all chucks including the first part, you could try:
(?:FooBar|(?:[0-9]{7}[0-9A-F]{8})+)
Explanation
(?:
FooBar
|
(?:[0-9]{7}[0-9A-F]{8})+
Close non capturing group
Upvotes: 1
Reputation: 91385
Is that what you want?
$in = 'C7,22801422CFE0F63 ,2280141C5EF0F63 ,22801402EFD0F63 ,2280138C5ED0F63 ,228024329897530 ,228023829877530';
preg_match_all('/(^\w+|\G)\h*(,[0-9]{7}[0-9A-F]{8})/', $in, $m);
print_r($m);
Output:
Array
(
[0] => Array
(
[0] => C7,22801422CFE0F63
[1] => ,2280141C5EF0F63
[2] => ,22801402EFD0F63
[3] => ,2280138C5ED0F63
[4] => ,228024329897530
[5] => ,228023829877530
)
[1] => Array
(
[0] => C7
[1] =>
[2] =>
[3] =>
[4] =>
[5] =>
)
[2] => Array
(
[0] => ,22801422CFE0F63
[1] => ,2280141C5EF0F63
[2] => ,22801402EFD0F63
[3] => ,2280138C5ED0F63
[4] => ,228024329897530
[5] => ,228023829877530
)
)
Explanation:
( : start group 1
^\w+ : beginning of line, 1 or more word characters
| : O
\G : match form this point
) : end group 1
\h* : 0 or more horizontal spaces
( : start group 2
, : a comma
[0-9]{7} : 7 digits
[0-9A-F]{8} : 8 hexa
) : end group 2
Upvotes: 1