Jay
Jay

Reputation: 791

How to match all characters in a string and collect multiple groups?

I'm trying to figure out how to select all the "-on_"s in a specific match using preg_match_all.

I've tried lots of regex patterns but I'm totally stumped. The best regex-er in our company has been working on this for an hour or 2 and can't make any headway either.

This one seems to be most promising .*(-on_).* - but only catches the last "-on_" of each match. Also the first match works correctly, but the second match is everything on the page. I don't understand why.

The example of the HTML I'm trying to parse...

<span class="RatingStar__bew-avgstars__2enAh">
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__28Wmg">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
            </div>
        </span>

... more unimportant no-need-to-match code between ...


<span class="RatingStar__bew-avgstars__2enAh">
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__28Wmg">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
            </div>
        </span>

What I'm using to parse it...

preg_match_all('~<span class="RatingStar__bew-avgstars__2enAh">.*(-on_).*</div></span>~', $html, $matches)

the response I'm getting is not worth much with how large it is so I'll just summarize:

array:2 [▼
  0 => array:2 [▼
    0 => "Perfectly correct match"
    1 => "Match of the rest of the page (not correct)"
  ]
  1 => array:2 [▼
    0 => "-on_" // Last on in the match
    1 => "-on_" // Last on in the second match
  ]
]

for the 2 matches that I should be getting I should get a group of 4 "-on_"s per match with the listed code.

So what I'm actually expecting is:

array:2 [▼
  0 => array:2 [▼
    0 => "<span class="RatingStar__bew-avgstars__2enAh"><div class="RatingStar__be-c-star__24d1B "><span><span class="RatingStar__be-star-off__2ks1e">★</span></span><span ▶"
    1 => "<span class="RatingStar__bew-avgstars__2enAh"><div class="RatingStar__be-c-star__24d1B "><span><span class="RatingStar__be-star-off__2ks1e">★</span></span><span ▶"
  ]
  1 => array:2 [▼
    0 => ["-on_","-on_","-on_","-on_"] 
    1 => ["-on_","-on_","-on_","-on_"]
  ]
]

Maybe I'm completely missing something here... any advice?

Upvotes: 0

Views: 75

Answers (1)

Benjamin
Benjamin

Reputation: 1372

I believe this is closer to what you want:

~<span class="RatingStar__bew-avgstars__2enAh">[\s\S]*?(-on_)[\s\S]*?</div>\s*</span>~

You have three problems:

  1. .* does not match the newline character \n. More info. You can use [\s\S]* instead, which matches every whitespace character and every non-whitespace character (so, every character).
  2. The text </div></span> does not appear in your snippet. There is whitespace between the </div> and the </span>. Hence, </div>\s*?</span>.
  3. You are using the greedy operator * rather than the lazy operator *?. This is a problem because your entire string ends with </div></span>, which means the first match will consume all other matches and proceed to the end of the string.

Upvotes: 2

Related Questions