Reputation: 1149
Edit:
It seems that people think that I'm trying to parse HTML, while I've accented a couple times that I'm trying to parse logs, and the <option>
structure is similar to my logs.
My logs look something like this:
!# [2013-03-04 14:51:31] // cluster1 BEGIN \\
!## apache: 41
!## mysql: 31
!## tomcat: 81
!## lotus: 985
!# [2013-03-04 14:51:56] // cluster1 END \\
!# [2013-03-04 14:51:56] // cluster2 BEGIN \\
!## apache: 13
!## mysql: 61
!## tomcat: 6
!## lotus: 513
!# [2013-03-04 14:52:13] // cluster2 END \\
I cant get this regexp to work, maybe it's not possible.. Need help:)
Basically I'm trying to regexp multiple child elements from a parent entity in one go. For the sake of brevity I'll use a dropdown <select>
HTML element as an example. This will be actually used for log parsing, but I'm not yet certain exactly what format it'll be, and the dropdown element is as close to what I need without having to explain the structure of the logs.
So let's assume we have a dropdown:
<select class="parent">
<option value="1">First child</option>
<option value="2">Second child</option>
<option value="3">Third child</option>
...
</select>
To seperate the <option>
elements from the parent, I'd use this:
preg_match_all('/<select class="parent">(.*)<\/select>/is', $source, $matches);
Which is great. But now I have to do a second preg_match()
to filter out my <option>
elements, so it would look something like this:
preg_match_all('/<option value="(.*?)" >(.*?)<\/option>/is', $matches['1'], $finalMatches);
And I get my results just fine. But is there a way to combine the two commands into one rule? So it would find the parent element, in this case a <select class="parent">*</select>
block, and filter out each <option value="*">*</option>
entry found within that parent? I'd then be left with a perfect array of parent - child combinations, rather than iterating the first result, and then having each iteration complete another preg_match function.
Upvotes: 1
Views: 1629
Reputation: 75222
I think this is what you're looking for:
preg_match_all(
'~(?:<select class="parent">|\G)\s*<option value="(.*?)">(.*?)</option>~i',
$source, $matches);
\G
anchors the match to the position where the previous match ended (or to the beginning of the input if there was no previous match). So the first match will include the opening <select>
tag and the first <option>
element, and each match after that will contain the next <option>
element--it won't skip ahead to find matches inside a later <select>
element.
Here's a demo. I also used \K
, the Match Start Reset operator, but that's not required; I just think it makes the output easier to read. It effectively turns everything before it into a positive lookbehind, without the usual limitations.
Upvotes: 2