codersarepeople
codersarepeople

Reputation: 2001

Regular Expression repetition of class

I am trying to figure out a regular expression for the following:

<tr class="A">.*</tr><tr class="(B|C)">.*</tr>

Now The second tr class will repeat an unknown number of times, with something unknown in between repetitions, but simply putting it in parentheses and added a plus doesn't work.

Here's the PHP code that didn't work:

$pattern = '/<tr\ class=\"A\">.*(<tr\ class=\"(B|C)\">.*<\/tr>.*)+/';
preg_match_all($pattern,$playerHtml,$scores);

But it only returns the first

Here's an example of something that should match:

<tr class="A">blah</tr>blah
<tr class="B">blah</tr>blah
<tr class="B">blah</tr>blah
<tr class="C">blah</tr>

This only matches blahblahblah

Upvotes: 1

Views: 181

Answers (4)

bozdoz
bozdoz

Reputation: 12860

For your particular example, this regex will do:

/<tr class="A">.*?<\/tr>.*\n?(<tr class="[BC]">.*?<\/tr>.*\n?)+/

Hope you can tweak it if need be. See the codepad demo here.

I needed to include \n newline characters for it to work.

Because they are TR elements outside of TABLE elements, I had a hard time seeing the result of the preg_match_all function (because my browser immediately stripped the random TR elements). You may have had similar problems. I used htmlspecialchars() in the demo to output the regex match.

Also, it's improper to have text between two TR elements:

<tr></tr>blah<tr></tr>

So you should be careful about doing that.

Upvotes: 1

hakre
hakre

Reputation: 197795

preg_match_all will look for your whole pattern multiple times.

As it's found only once (I assume because the start is in $playerHtml only once), you only get one match.

Instead, first look for your whole pattern and extract the part you're interested in, then continue with that segment:

$pattern = '/<tr\ class=\"A\">.*(<tr\ class=\"(B|C)\">.*<\/tr>.*)+/';
$r = preg_match($pattern, $playerHtml, $matches);
if (FALSE === $r) throw new Exception('Regex failed.');

list(,$scoreHtml) = $matches;

$r = preg_match_all('/(<tr\ class=\"(B|C)\">.*<\/tr>.*)/', $scoreHtml, $scores);
if (FALSE === $r) throw new Exception('Regex failed.');

This code is quickly written and will most certainly not work, it's just for illustrating that you need to do multiple steps.

However, if you're using a HTML parser instead of regular expressions, I bet it's much more quickier to obtain the values you're after with some little xpath query:

//tr[@class="B" or @class="C"]

This selects all <tr> elements with the classes you look for. Much easier.

Upvotes: 0

Abhi Beckert
Abhi Beckert

Reputation: 33369

I can't test it, since I'm on my phone, but what do you get in $scores with this pattern?

<tr class="A">.*</tr><tr class="((B)|(C)|[^"]+)+">.*</tr>

Upvotes: 0

Seyeong Jeong
Seyeong Jeong

Reputation: 11028

Try:

 <tr class="A">.*</tr><tr class="((B|C)\s*)+">.*</tr>

+ indicates one or more times and * indicates 0 or more times. Also \s inidcates a white space.

((B|C)\s*)+ means there will be one or more blocks of (B|C)\s*

(B|C)\s* means there will be a string starts with B or C then some whitespaces may be followed.

Upvotes: 0

Related Questions