streetparade
streetparade

Reputation: 32888

Preg_match_all <a href

Hello i want to extract links <a href="/portal/clients/show/entityId/2121" > and i want a regex which givs me /portal/clients/show/entityId/2121 the number at last 2121 is in other links different any idea?

Upvotes: 1

Views: 34443

Answers (6)

user2876137
user2876137

Reputation:

This is my solution:

<?php
// get links
$website = file_get_contents("http://www.example.com"); // download contents of www.example.com
preg_match_all("<a href=\x22(.+?)\x22>", $website, $matches); // save all links \x22 = "

// delete redundant parts
$matches = str_replace("a href=", "", $matches); // remove a href=
$matches = str_replace("\"", "", $matches); // remove "

// output all matches
print_r($matches[1]);
?>

I recommend to avoid using xml-based parsers, because you will not always know, whether the document/website has been well formed.

Best regards

Upvotes: 1

karim79
karim79

Reputation: 342645

Simple PHP HTML Dom Parser example:

// Create DOM from string
$html = str_get_html($links);

//or
$html = file_get_html('www.example.com');

foreach($html->find('a') as $link) {
    echo $link->href . '<br />';
}

Upvotes: 11

soulmerge
soulmerge

Reputation: 75714

Don't use regular expressions for proccessing xml/html. This can be done very easily using the builtin dom parser:

$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
    # Xpath query for attributes gives a NodeList containing DOMAttr objects.
    # http://php.net/manual/en/class.domattr.php
    echo $nodeList->item($i)->value . "<br/>\n";
}

Upvotes: 7

BMBM
BMBM

Reputation: 16013

When "parsing" html I mostly rely on PHPQuery: http://code.google.com/p/phpquery/ rather then regex.

Upvotes: 1

Yacoby
Yacoby

Reputation: 55445

Regex for parsing links is something like this:

'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'

Given how horrible that is, I would recommend using Simple HTML Dom for getting the links at least. You could then check links using some very basic regex on the link href.

Upvotes: 1

Bart Kiers
Bart Kiers

Reputation: 170178

Paring links from HTML can be done using am HTML parser.

When you have all links, simple get the index of the last forward slash, and you have your number. No regex needed.

Upvotes: 0

Related Questions