Reputation: 63
I have this html data and I am trying to extract the first href value from the below div element.
<div>blah blah.
<a href="http://www.example.com">example</a>
<a href="http://www.example2.com">site</a>
</div>
I tried using this regex, but I can't figure out where I am going wrong?
preg_match('/<div>.*?<a.*"(.*)">/', $html, $match);
Could someone suggest a better approach?
Upvotes: 0
Views: 577
Reputation: 67988
x="<div>blah blah.\n\t<a href="http://www.example.com">example</a>\n\t<a href="http://www.example2.com">site</a>\n</div>"
import re
pattern=re.compile(r".*? href=(\S+?)>.*?",re.DOTALL)
y=pattern.match(x).groups()
print y[0]
output:"http://www.example.com"
Upvotes: 0
Reputation: 4906
See the answer from hwnd to use a more comfortable and precise way.
To do your request really with a regex instead you coude use such a approach:
<div>.*?<a[^>]+href="([^"]*)"
Still to say:
Upvotes: 0
Reputation: 19
You can try this
preg_match('/<div>[^<]*?<a[^>]*\"([^>]*?)\"/', $html, $match);
var_dump($match);
Upvotes: -1
Reputation: 70732
Use the right tool for the job, not a regular expression.
$dom = DOMDocument::loadHTML('
<div>blah blah.
<a href="http://www.example.com">example</a>
<a href="http://www.example2.com">site</a>
</div>
');
$xpath = new DOMXPath($dom);
$link = $xpath->query("//div/a")->item(0);
echo $link->getAttribute('href'); //=> "http://www.example.com"
Upvotes: 3