Reputation: 2001
I am trying to figure out a regular expression for the following:
<tr class="A">.*</tr><tr class="(B|C)">.*</tr>
Now The second tr class will repeat an unknown number of times, with something unknown in between repetitions, but simply putting it in parentheses and added a plus doesn't work.
Here's the PHP code that didn't work:
$pattern = '/<tr\ class=\"A\">.*(<tr\ class=\"(B|C)\">.*<\/tr>.*)+/';
preg_match_all($pattern,$playerHtml,$scores);
But it only returns the first
Here's an example of something that should match:
<tr class="A">blah</tr>blah
<tr class="B">blah</tr>blah
<tr class="B">blah</tr>blah
<tr class="C">blah</tr>
This only matches blahblahblah
Upvotes: 1
Views: 181
Reputation: 12860
For your particular example, this regex will do:
/<tr class="A">.*?<\/tr>.*\n?(<tr class="[BC]">.*?<\/tr>.*\n?)+/
Hope you can tweak it if need be. See the codepad demo here.
I needed to include \n
newline characters for it to work.
Because they are TR elements outside of TABLE elements, I had a hard time seeing the result of the preg_match_all function (because my browser immediately stripped the random TR elements). You may have had similar problems. I used htmlspecialchars() in the demo to output the regex match.
Also, it's improper to have text between two TR elements:
<tr></tr>blah<tr></tr>
So you should be careful about doing that.
Upvotes: 1
Reputation: 197795
preg_match_all
will look for your whole pattern multiple times.
As it's found only once (I assume because the start is in $playerHtml
only once), you only get one match.
Instead, first look for your whole pattern and extract the part you're interested in, then continue with that segment:
$pattern = '/<tr\ class=\"A\">.*(<tr\ class=\"(B|C)\">.*<\/tr>.*)+/';
$r = preg_match($pattern, $playerHtml, $matches);
if (FALSE === $r) throw new Exception('Regex failed.');
list(,$scoreHtml) = $matches;
$r = preg_match_all('/(<tr\ class=\"(B|C)\">.*<\/tr>.*)/', $scoreHtml, $scores);
if (FALSE === $r) throw new Exception('Regex failed.');
This code is quickly written and will most certainly not work, it's just for illustrating that you need to do multiple steps.
However, if you're using a HTML parser instead of regular expressions, I bet it's much more quickier to obtain the values you're after with some little xpath query:
//tr[@class="B" or @class="C"]
This selects all <tr>
elements with the classes you look for. Much easier.
Upvotes: 0
Reputation: 33369
I can't test it, since I'm on my phone, but what do you get in $scores with this pattern?
<tr class="A">.*</tr><tr class="((B)|(C)|[^"]+)+">.*</tr>
Upvotes: 0
Reputation: 11028
Try:
<tr class="A">.*</tr><tr class="((B|C)\s*)+">.*</tr>
+
indicates one or more times and *
indicates 0 or more times. Also \s
inidcates a white space.
((B|C)\s*)+
means there will be one or more blocks of (B|C)\s*
(B|C)\s*
means there will be a string starts with B
or C
then some whitespaces may be followed.
Upvotes: 0