Reputation: 55
I am trying to scrape a table using PHP, the thing is that I've managed to scrape it, but I get everything on the webpage's table. I am unsure how I specify which TD's and/or TR's I want to scrape.
Here's the PHP code
<?php
include("simple_html_dom.php");
$html=file_get_html("http://www.premierleague.com/en-gb/matchday/league-table.html");
$html=new simple_html_dom($html);
foreach($html->find('table tr') as $row) {
$cell = $row->find('td', 0);
echo $row;
}
?>
What I want to get (if you view the website) is: Club name, played, won, lost, goals for, goals against, goal difference, and points.
What I get is everything in the table, including the collapsed team information. It looks like this (not sure if a picture is the best way to post it but I'm not sure how to show it in another way, I highlighted the part that I actually want scraped):
Upvotes: 3
Views: 2549
Reputation: 8685
Have you tried looking at the advanced usage of Simple HTML DOM Parser?
I wrote this based on the manual at the link above; it might get you in the right direction:
require "simple_html_dom.php";
$html=file_get_html("http://www.premierleague.com/en-gb/matchday/league-table.html");
$html=new simple_html_dom($html);
$rows = array();
foreach($html->find('table.leagueTable tr.club-row') as $tr){
$row = array();
foreach($tr->find('td.col-club,td.col-p,td.col-w,td.col-l,td.col-gf,td.col-ga,td.col-gd,td.col-pts') as $td){
$row[] = $td->innertext;
}
$rows[] = $row;
}
var_dump($rows);
Essentially, you want all the <tr>
elements which have a class of club-row
(adding a .
indicates class); furthermore, you only want rows which are nested within the <table>
with class leagueTable
. That's what the first find is doing. The space after the table indicates you want descendants of it.
Next, you want <td>
elements which have the various classes you mentioned. You can separate these with a comma to mean "and". (Give me td.col-club AND td.col-p AND...)
The foreach
loops are simply walking through those parsed DOM elements and adding their innertext to an array. You can do whatever you like with them after that.
Upvotes: 2
Reputation: 1237
$output = array();
foreach($html->find('table',0)->find('tr') as $row) {
$club = $row->find('.col-club', 0);
$p = $row->find('.col-p', 0);
$output[] = array("club" => $club->innertext , "p" => $p->innertext);
}
var_dump($output);
This is what i would do
EDIT: the traversing part:
foreach($output as $row)
{
foreach($row as $key => $value)
{
echo $key ."|||" . $value ."</br>";
}
echo "</br>";
}
EDIT: Forgot extracting the innertext~
Upvotes: 1
Reputation: 5071
May be playing a little around this solution may produce the results for you. I have tried for a class and it is fetching the results for one row. Check if it is the solution you are looking for:
<?php
$grab = file_get_contents("http://www.premierleague.com/en-gb/matchday/league-table.html");
$first = explode( '<td class="col-sort">' , $grab );
$second = explode("</td>" , $first[1] );
?>
<table style="width:80%">
<tr>
<td><?php echo $second["1"];?> (LP)</td>
<td><?php echo $second["2"];?> (Club)</td>
<td><?php echo $second["3"];?> (P)</td>
<td><?php echo $second["4"];?> (W)</td>
<td><?php echo $second["5"];?> (D)</td>
</tr>
</table>
Upvotes: 1