Reputation: 791
I'm trying to figure out how to select all the "-on_"s in a specific match using preg_match_all.
I've tried lots of regex patterns but I'm totally stumped. The best regex-er in our company has been working on this for an hour or 2 and can't make any headway either.
This one seems to be most promising .*(-on_).*
- but only catches the last "-on_" of each match. Also the first match works correctly, but the second match is everything on the page. I don't understand why.
The example of the HTML I'm trying to parse...
<span class="RatingStar__bew-avgstars__2enAh">
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-on__28Wmg">★</span></span>
</div>
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
</div>
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
</div>
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
</div>
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
</div>
</span>
... more unimportant no-need-to-match code between ...
<span class="RatingStar__bew-avgstars__2enAh">
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-on__28Wmg">★</span></span>
</div>
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
</div>
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
</div>
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
</div>
<div class="RatingStar__be-c-star__24d1B ">
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
<span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
</div>
</span>
What I'm using to parse it...
preg_match_all('~<span class="RatingStar__bew-avgstars__2enAh">.*(-on_).*</div></span>~', $html, $matches)
the response I'm getting is not worth much with how large it is so I'll just summarize:
array:2 [▼
0 => array:2 [▼
0 => "Perfectly correct match"
1 => "Match of the rest of the page (not correct)"
]
1 => array:2 [▼
0 => "-on_" // Last on in the match
1 => "-on_" // Last on in the second match
]
]
for the 2 matches that I should be getting I should get a group of 4 "-on_"s per match with the listed code.
So what I'm actually expecting is:
array:2 [▼
0 => array:2 [▼
0 => "<span class="RatingStar__bew-avgstars__2enAh"><div class="RatingStar__be-c-star__24d1B "><span><span class="RatingStar__be-star-off__2ks1e">★</span></span><span ▶"
1 => "<span class="RatingStar__bew-avgstars__2enAh"><div class="RatingStar__be-c-star__24d1B "><span><span class="RatingStar__be-star-off__2ks1e">★</span></span><span ▶"
]
1 => array:2 [▼
0 => ["-on_","-on_","-on_","-on_"]
1 => ["-on_","-on_","-on_","-on_"]
]
]
Maybe I'm completely missing something here... any advice?
Upvotes: 0
Views: 75
Reputation: 1372
I believe this is closer to what you want:
~<span class="RatingStar__bew-avgstars__2enAh">[\s\S]*?(-on_)[\s\S]*?</div>\s*</span>~
You have three problems:
.*
does not match the newline character \n
. More info. You can use [\s\S]*
instead, which matches every whitespace character and every non-whitespace character (so, every character).</div></span>
does not appear in your snippet. There is whitespace between the </div>
and the </span>
. Hence, </div>\s*?</span>
.*
rather than the lazy operator *?
. This is a problem because your entire string ends with </div></span>
, which means the first match will consume all other matches and proceed to the end of the string.Upvotes: 2