chaz
chaz

Reputation: 63

preg_match get first href from div

I have this html data and I am trying to extract the first href value from the below div element.

<div>blah blah.
    <a href="http://www.example.com">example</a>
    <a href="http://www.example2.com">site</a>
</div>

I tried using this regex, but I can't figure out where I am going wrong?

preg_match('/<div>.*?<a.*"(.*)">/', $html, $match);

Could someone suggest a better approach?

Upvotes: 0

Views: 577

Answers (4)

vks
vks

Reputation: 67988

    x="<div>blah blah.\n\t<a href="http://www.example.com">example</a>\n\t<a href="http://www.example2.com">site</a>\n</div>"
    import re
    pattern=re.compile(r".*? href=(\S+?)>.*?",re.DOTALL)
    y=pattern.match(x).groups()
    print y[0]

output:"http://www.example.com"

Upvotes: 0

bukart
bukart

Reputation: 4906

See the answer from hwnd to use a more comfortable and precise way.

To do your request really with a regex instead you coude use such a approach:

<div>.*?<a[^>]+href="([^"]*)"

Regular expression visualization

Debuggex Demo


Still to say:

  • Don't reinvent the wheel, like @hwnd said
  • avoid parsing HTML/XML & Co. with regex

Upvotes: 0

jifei
jifei

Reputation: 19

You can try this
preg_match('/<div>[^<]*?<a[^>]*\"([^>]*?)\"/', $html, $match); var_dump($match);

Upvotes: -1

hwnd
hwnd

Reputation: 70732

Do not reinvent the wheel..

Use the right tool for the job, not a regular expression.

$dom = DOMDocument::loadHTML('
     <div>blah blah.
         <a href="http://www.example.com">example</a>
         <a href="http://www.example2.com">site</a>
     </div>
');
$xpath = new DOMXPath($dom);
$link  = $xpath->query("//div/a")->item(0);
echo $link->getAttribute('href'); //=> "http://www.example.com"

Upvotes: 3

Related Questions