php dom not accepting url

Question

I am trying to create a program that will open a text file with urls seperated by |. It will then take the first line of the text document, crawl that url and remove it from the text file. Each url is to be scraped by a basic crawler. I know the crawler part works because if I enter in one of the urls in quotations, rather than a variable from the text file, it will work. I am at the point where it will not return anything because the url simply will not be accepted.

this is a basic version of my code because I had to break it down alot to iscolate the problem.

$urlarray = explode("|", $contents = file_get_contents('urls.txt'));

$url = $urlarray[0];
$dom = new DOMDocument('1.0');
@$dom->loadHTMLFile($url);

$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $element)
{
    $title = $element->getAttribute('title');
    $class = $element->getAttribute('class');
    if($class == 'result_link')
    {
        $title = str_replace('Synonyms of ', '', $title);
        echo $title . "
";
    }
}`

Tim Wickstrom · Accepted Answer

The code below works like a champ tested with your example data:

loadHTML($html);

$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $element)
{
    $title = $element->getAttribute('title');
    $class = $element->getAttribute('class');
    if($class == 'result_link')
    {
        $title = str_replace('Synonyms of ', '', $title);
        echo $title . "
";
    }
}
?>

ALMOST FORGOT: LETS NOW PUT IT IN A LOOP TO LOOP THROUGH ALL URLS:

loadHTML($html);

            $anchors = $dom->getElementsByTagName('a');
            foreach($anchors as $element)
            {
                $title = $element->getAttribute('title');
                $class = $element->getAttribute('class');
                if($class == 'result_link')
                {
                    $title = str_replace('Synonyms of ', '', $title);
                    echo $title . "
";
                }
            }
            echo '';
        }
    }
?>

php dom not accepting url

Answers (2)

Related Questions