Filippo Schiera
Filippo Schiera

Reputation: 27

Web scraping with PHP and HTML DOM Parser

I'm trying to scrape the site inside the code but I would it in table format.

$url='http://www.arbworld.net/en/moneyway';
    libxml_use_internal_errors( true );
    $dom=new DOMDocument;
    $dom->validateOnParse=false;
    $dom->recover=true;
    $dom->strictErrorChecking=false;
    $dom->loadHTMLFile( $url );
    libxml_clear_errors();


    $xp=new DOMXPath( $dom );
    $col=$xp->query('//table[@class="grid"]/tr[@class="belowHeader"]/td');

    if( $col->length > 0 ){
        foreach( $col as $node )echo $node->textContent;
    }

Now the output is this:

Romanian Liga I22.Dec 18:00:00 FCSBUniversitat2.063.33.999.9 %€ 2070.1 %€ 00 %€ 0€ 207 22.Dec 18:00:00 Italian Serie A22.Dec 11:30:00 AtalantaAC Milan1.8844.499.7 %€ 21 5580.1 %€ 170.2 %€ 46€ 21 622 22.Dec 11:30:00 English League 221.Dec 15:0 0:00

Upvotes: 1

Views: 316

Answers (1)

Jeto
Jeto

Reputation: 14927

You should retrieve the rows instead of the columns (without the /td at the end), then simply put everything into an HTML table, with one <tr> for each row:

<?php
// your current code

$xp = new DOMXPath($dom);
$rows = $xp->query('//table[@class="grid"]/tr[@class="belowHeader"]');
?>

<table>
  <tbody>
  <?php foreach ($rows as $row): ?>
    <tr>
    <?php foreach ($row->childNodes as $col): ?>
      <?php if ($col->getAttribute('style') !== 'display:none'): ?>
        <?php foreach ($col->childNodes as $colPart): ?>
          <?php if ($colText = trim($colPart->textContent)): ?>
          <td><?= $colText ?></td>
          <?php elseif ($colPart instanceof DOMElement && $colPart->tagName === 'a'): ?>
            <?php
            $href = $colPart->getAttribute('href');
            if (strpos($href, 'javascript') !== 0):
            ?>
            <td><?= $colPart->getAttribute('href') ?></td>
            <?php endif ?>
          <?php endif ?>
        <?php endforeach ?>
      <?php endif ?>
    <?php endforeach ?>
    </tr>
  <?php endforeach ?>
  </tbody>
</table>

Upvotes: 1

Related Questions