Jack
Jack

Reputation: 1693

Retrieve all <a href=....jpg"> tags in PHP

I used the following regex:

$regex = '/<a href=\"([^\"]*)\">(.*)<\/a>/iU';

but it always fail to retrieve the tags that I wanted.

It always miss out on the following tags:

<a href="http://site.com/folder/img1.jpg" name="test">

and also it will retrieve those that I do not want such as:

<a href="mailto:[email protected]">

and

<a href="http://site.com/folder/index.html">

How do I modify my regex so that it will retrieve all the <a href="....jpg" and if I got the following:

<a href="http://site.com/folder/img1.jpg" name="test">

it will simply display

<a href="http://site.com/folder/img1.jpg">

and also it will not retrieve the followings:

<a href="mailto:[email protected]">

and

<a href="http://site.com/folder/index.html">

Thank you.

Would appreciate if could provide freeware that can help to generate regex.

Upvotes: 0

Views: 920

Answers (4)

dldnh
dldnh

Reputation: 8951

This will do what you want, perhaps differently from how you were expecting to do it...

<?php
// set up to parse our input
$dom = new DOMDocument();
$dom->loadHTMLFile("input.html");
$xpath = new DOMXPath($dom);

$anchors = $xpath->query("//a[contains(@href, 'http') and contains(@href, '.jpg')]");

foreach ($anchors as $anchor) {
  echo $anchor->C14N() . "\n";
}
?>

Upvotes: 1

mathematical.coffee
mathematical.coffee

Reputation: 56905

Try the regex

$regex = '/(<a href="([^"]+)\.jpg")[^>]*>/iU';

And replace with '\1>'.

Notes:

  • Removed the escape in front of the "; not necessary (although you can leave them in if you want, it doesn't mak a difference)
  • Added an explicit \.jpg just before the last " to only match links ending with .jpg. You might consider \.jpe?g to allow '.jpeg' as well as '.jpg' (although the former is not that common)
  • Added a [^>]* before the > of the first <a href=...> to allow for optional extra attributes like name="asdf"
  • Added capturing brackets around the (<a href="xxx") bit so that I can replace with \1> (hence stripping out all the extra attributes).

Re a regex generating tool, I don't know of any that generate regex. I think your best bet is to learn regex yourself and then use an interactive tester to quickly develop it.

I highly recommend regexr.com.

If you follow that link you'll see exactly the regex I entered in and some test data to play around with it.

Then you can play around wth the regex and see the results in real-time -- it's very helpful for fast development of regexes.

(Although, regexr.com does not offer the ungreedy 'U' tag; just convert all + to +? and * to *? in the regex to simulate this).

Upvotes: 2

iambriansreed
iambriansreed

Reputation: 22241

Check out http://gskinner.com/RegExr/.

I love that thing.

It will teach you how to construct your own patterns.

Regex (regular expressions) is an invaluable programming skill that is applicable in mane server side and client side programming languages.

Upvotes: 1

Igor Escobar
Igor Escobar

Reputation: 1087

I don't know for what exactly are you using this regex, but i thinks this should work for you:

$your_string = '<a href="http://site.com/folder/img1.jpg" name="test">';
preg_match('@<a href="(.*?)".*?>(.*<\/a>)?@', $your_string, $matches);

print_r($matches) // Array ( [0] => http://site.com/folder/img1.jpg )

Upvotes: 1

Related Questions