jnbdz
jnbdz

Reputation: 4403

How do I get the link element in a html page with PHP

First, I know that I can get the HTML of a webpage with:

file_get_contents($url);

What I am trying to do is get a specific link element in the page (found in the head).

e.g:

<link type="text/plain" rel="service" href="/service.txt" /> (the element could close with just >)

My question is: How can I get that specific element with the "rel" attribute equal to "service" so I can get the href?

My second question is: Should I also get the "base" element? Does it apply to the "link" element? I am trying to follow the standard.

Also, the html might have errors. I don't have control on how my users code there stuff.

Upvotes: 1

Views: 1792

Answers (3)

karim79
karim79

Reputation: 342795

Using PHP's DOMDocument, this should do it (untested):

$doc = new DOMDocument();
$doc->loadHTML($file);
$head = $doc->getElementsByTagName('head')->item(0);
$links = $head->getElementsByTagName("link");
foreach($links as $l) {
    if($l->getAttribute("rel") == "service") {
        echo $l->getAttribute("href");
    }
}

Upvotes: 3

Marc-Christian Schulze
Marc-Christian Schulze

Reputation: 3264

I'm working with Selenium under Java for Web-Application-Testing. It provides very nice features for document traversal using CSS-Selectors.

Have a look at How to use Selenium with PHP.
But this setup might be to complex for your needs if you only want to extract this one link.

Upvotes: 0

John Green
John Green

Reputation: 13445

You should get the Base element, but know how it works and its scope.

In truth, when I have to screen-scrape, I use phpquery. This is an older PHP port of jQuery... and what that may sound like something of a dumb concept, it is awesome for document traversal... and doesn't require well-formed XHTMl.

http://code.google.com/p/phpquery/

Upvotes: 0

Related Questions