Sami
Sami

Reputation: 1491

Collect web data using Simple HTML Dom from multiple pages

I used the below code and successfully collected the data from a specific page as follows:

    include 'simplehtmldom/simple_html_dom.php';

    $html = file_get_html('http://test.com/file/1209i0329/');

    // Find all article blocks
    foreach($html->find('div.Content') as $file) {
        $item['date']     = $file->find('id.article-date', 0)->plaintext;
        $item['location']    = $file->find('id.article-location', 0)->plaintext;
        $item['price'] = $file->find('div.article', 0)->plaintext;
        $files[] = $item;
    }

    print_r($files);

The code works well for http://test.com/file/1209i0329.php, but my goal is to collect data from all pages starting with http://test.com/file/ on this domain (For example, http://test.com/file/1209i0329/, http://test.com/file/120dnkj329/, and etc). Is there a solution to overcome this problem using simle_html_dom?

Upvotes: 1

Views: 2481

Answers (1)

user1978142
user1978142

Reputation: 7948

I dont know where you would search your files (same domain, or outside), you may need to loop an array containing the urls of what you want to search.

Consider this example:

include 'simplehtmldom/simple_html_dom.php';

// most likely this process will take some time

$files = array();
$urls = array(
    'http://test.com/file/1209i0329/',
    'http://test.com/file/120dnkj329/',
    'http://en.wikipedia.org/wiki/',
);

foreach($urls as $url) {

    $html = file_get_html($url);

    // Find all article blocks
    foreach($html->find('div.Content') as $file) {
        $item['date']     = $file->find('id.article-date', 0)->plaintext;
        $item['location']    = $file->find('id.article-location', 0)->plaintext;
        $item['price'] = $file->find('div.article', 0)->plaintext;
        $files[] = $item;
    }

}

print_r($files);

Upvotes: 3

Related Questions