Alex C
Alex C

Reputation: 17024

Can't read Meta-Redirect URL in a DOMDocument

I'm trying to read the meta redirect of a website. The data is in a curl request (I've built a stub to test with).

What's not working is the "read a URL" thing - any PHP DOMDocument experts out there able to tell me why this code isn't working? I'm trying to get the URL out of the meta refresh tag.

    $r['body'] = '<HTML><HEAD><TITLE>Meta Refresh Example</TITLE>'.
                 '<meta http-equiv=refresh content="12; URL=meta2.htm">'.
                 '<link rel="stylesheet" href="../bwsrstyle.css" type="text/css">'.
                 '<LINK REL="SHORTCUT ICON" href="/myicon.ico">'.
                 '<meta http-equiv="Content-Type" content="text/html; charset=></HEAD>'.
                 '<BODY BGCOLOR="#FFFFFF" TEXT="#000000">foo</BODY></HTML>';

$dom = new DOMDocument();
@$dom->loadHTML($r['body']);
$xpath = new DOMXpath($dom);
$meta_redirect = $xpath->query("//meta[@http-equiv='refresh']");

foreach ($meta_redirect as $node) { 
    echo         "\nNODE: {$node->getAttribute('http-equiv')} ".
                 "\nURL: {$node->getAttribute('url')}\n";   
}

The 'refresh' is pulling correctly but the URL is not.

Upvotes: 1

Views: 831

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243479

You do not have a wellformed XML document at all, but supposing it were wellformed then

Use:

substring-after(/*/*/meta[http-equiv="refresh"]/@content, " URL=")

Upvotes: 1

mario
mario

Reputation: 145482

There is no attribute url=. You need to query for the content= attribute.

 print  "\nURL: {$node->getAttribute('content')}\n"; 

And you will also have to manually split this result string up. It contains the 2; url= prefix still. This is not something the DOM functions deal with normally.

Upvotes: 2

Related Questions