Hassaan
Hassaan

Reputation: 7662

How to get data from HTML using regex

I have following HTML

<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>

I want to get value from <td class="stat stat-last"> => <div class="statnum"> = 22.

I have tried the follow regex but does not any found match.

/<div\sclass="statnum">^(.)\?<\/div>/ig

Upvotes: 0

Views: 163

Answers (4)

chris85
chris85

Reputation: 23892

Here's a way to accomplish this using a parser.

<?php
$html = '<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$tds = $doc->getElementsByTagName('td');
foreach ($tds as $cell) { //loop through all Cells
    if(strpos($cell->getAttribute('class'), 'stat-last')){
        $divs = $cell->getElementsByTagName('div');
        foreach($divs as $div) { // loop through all divs of the cell
            if($div->getAttribute('class') == 'statnum'){
                echo $div->nodeValue;
            }
        }
    }
}

Output:

22

...or using an xpath...

$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$statnums = $xpath->query("//td[@class='stat stat-last']/a/div[@class='statnum']");
foreach($statnums as $statnum) {
    echo $statnum->nodeValue;
}

Output:

22

or if you realllly wanted to regex it...

<?php
$html = '<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>';
preg_match('~td class=".*?stat-last">.*?<div class="statnum">(.*?)<~s', $html, $num);
echo $num[1];

Output:

22

Regex demo: https://regex101.com/r/kM6kI2/1

Upvotes: 3

jmrah
jmrah

Reputation: 6222

/<td class="stat stat-last">.*?<div class="statnum">(\d+)/si

Your match is in the first capture group. Notice the use of the s option at the end. Makes '.' match new line characters.

Upvotes: 2

mocak
mocak

Reputation: 405

You can edit your pattern like that:

/<div\sclass="statnum">(.*?)<\/div>/ig

Upvotes: 1

veta
veta

Reputation: 76

I think it would be better if you use an XML parser for that instead of regex. SimpleXML can do the job for you: http://php.net/manual/en/book.simplexml.php

Upvotes: 2

Related Questions