j0h
j0h

Reputation: 1784

how do I find a tag with simple_html_DOM

Im trying to use simple_html_dom with php to parse a webpage with this tag:

<div class="  row  result" id="p_a8a968e2788dad48" data-jk="a8a968e2788dad48" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">

where data-tn-component="organicJob" is the identifier I want to parse based on, I cant seem to specify the text in a way that simple_html_dom recognizes.

Ive tried a few things along this line:

<?PHP
include 'simple_html_dom.php';
$f="http://www.indeed.com/jobs?q=Electrician&l=maine";
    $html->load_file($f);
        foreach($html->find('div[data-tn-component="organicJob"]') as $div)
              {
                 echo  $div->innertext ;
               }
?>

but the parser doesn't find any of the results, even though i know they are there. Probably I'm not making specifying the thing I find correctly. I'm looking at the API, but I still don't understand how to format the find string. what am I doing wrong?

Upvotes: 1

Views: 183

Answers (1)

Armen
Armen

Reputation: 4202

Your selector is correct but i see other problems in your code

1) you are missing .php in your include include 'simple_html_dom'; it should be

include '/absolute_path/simple_html_dom.php';

2) to load content through url use file_get_html function instead $html->load_file($f); which is wrong as php don't know that $html is simple_html_dom object

$html = file_get_html('http://www.google.com/');
// then only call 
$html->find( ...

3) in your provided link: http://www.indeed.com/jobs?q=Electrician+Helper&l=maine there is no present element with data-tn-component attribute

so final code should be

include '/absolute_path/simple_html_dom.php';
$html = file_get_html('http://www.indeed.com/jobs?q=Electrician&l=maine');

$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
    echo  $div->innertext ;
}

Upvotes: 1

Related Questions