Andrew P.
Andrew P.

Reputation: 148

Nesting the result of regular expression

I'm parsing some HTML like this

<h3>Movie1</h3>
<div class="time"><span>10:00</span><span>12:00</span></div>
<h3>Movie2</h3>
<div class="time"><span>13:00</span><span>15:00</span><span>18:00</span></div>

I'd like to get result array looks like this

0 => 
  0 => Movie1
  1 => Movie2
1 =>
  0 => 
    0 => 10:00
    1 => 12:00
  1 => 
    0 => 13:00
    1 => 15:00
    2 => 18:00

I can do that on two steps

1) get the movie name and whole movie's schedule with tags by regexp like this

~<h3>(.*?)</h3>(?:.*?)<div class="time">(.*?)</div>~s

2) get time by regexp like this (I do it inside the loop for every movie I got on step 1)

~<span>([0-9]{2}:[0-9]{2})</span>~s

And it works well. The question is that: is there a regular expression that gives me the same result in only one step?

I tried nested groups like this

~<h3>(.*?)</h3>(?:.*?)<div class="time">((<span>(.*?)</span>)*)</div>~s

and I got only the last time of every movie (only 12:00 and 18:00).

Upvotes: 1

Views: 36

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

With DOMDocument:

$dom = new DOMDocument;
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$nodeList = $xpath->query('//h3|//div[@class="time"]/span');
$result = array();
$currentMovie = -1;

foreach ($nodeList as $node) {
    if ($node->nodeName === 'h3') {
        $result[0][++$currentMovie] = $node->nodeValue;
        continue;
    }
    $result[1][$currentMovie][] = $node->nodeValue;
}

print_r($result);

Note: to be more rigorous, you can change the xpath query to:

//h3[following-sibling::div[@class="time"]] | //div[@class="time"]/span

Upvotes: 1

Related Questions