user1356029
user1356029

Reputation: 381

Extracting data from HTML using Simple HTML DOM Parser

For a college project, I am creating a website with some back end algorithms and to test these in a demo environment I require a lot of fake data. To get this data I intend to scrape some sites. One of these sites is freelance.com.To extract the data I am using the Simple HTML DOM Parser but so far I have been unsuccessful in my efforts to actually get the data I need.

Here is an example of the HTML layout of the page I intend to scrape. The red boxes mark the required data.

Screenshot of HTML Code on Freelance.com

Here is the code I have written so far after following some tutorials.

<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {

    foreach($tr->find('td[class=title-col]') as $t) {
        //get the inner HTML
        $data = $t->outertext;
        echo $data;
    }
}

?>

Hopefully someone can point me in the right direction as to how I can get this working.

Thanks.

Upvotes: 1

Views: 4650

Answers (1)

Enissay
Enissay

Reputation: 4953

The raw source code is different, that's why you're not getting the expected results...

You can check the raw source code using ctrl+u, the data are in table[id=project_table_static], and the cells td have no attributes, so, here's a working code to get all the URLs from the table:

$url = 'http://www.freelancer.com/jobs/Website-Design/1/';
// Create DOM from URL
$html = file_get_html($url);

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {

    // Skip the first empty element
    if ($i==0) {
        continue;
    }

    echo "<br/>\$i=".$i;

    // get the first anchor
    $anchor = $tr->find('a', 0);
    echo " => ".$anchor->href;
}

// Clear dom object
$html->clear(); 
unset($html);

Demo

Upvotes: 1

Related Questions