Reputation: 1693
I used the following regex:
$regex = '/<a href=\"([^\"]*)\">(.*)<\/a>/iU';
but it always fail to retrieve the tags that I wanted.
It always miss out on the following tags:
<a href="http://site.com/folder/img1.jpg" name="test">
and also it will retrieve those that I do not want such as:
<a href="mailto:[email protected]">
and
<a href="http://site.com/folder/index.html">
How do I modify my regex so that it will retrieve all the <a href="....jpg"
and if I got the following:
<a href="http://site.com/folder/img1.jpg" name="test">
it will simply display
<a href="http://site.com/folder/img1.jpg">
and also it will not retrieve the followings:
<a href="mailto:[email protected]">
and
<a href="http://site.com/folder/index.html">
Thank you.
Would appreciate if could provide freeware that can help to generate regex.
Upvotes: 0
Views: 920
Reputation: 8951
This will do what you want, perhaps differently from how you were expecting to do it...
<?php
// set up to parse our input
$dom = new DOMDocument();
$dom->loadHTMLFile("input.html");
$xpath = new DOMXPath($dom);
$anchors = $xpath->query("//a[contains(@href, 'http') and contains(@href, '.jpg')]");
foreach ($anchors as $anchor) {
echo $anchor->C14N() . "\n";
}
?>
Upvotes: 1
Reputation: 56905
Try the regex
$regex = '/(<a href="([^"]+)\.jpg")[^>]*>/iU';
And replace with '\1>'.
Notes:
\.jpg
just before the last "
to only match links ending with .jpg
. You might consider \.jpe?g
to allow '.jpeg' as well as '.jpg' (although the former is not that common)[^>]*
before the >
of the first <a href=...>
to allow for optional extra attributes like name="asdf"
(<a href="xxx")
bit so that I can replace with \1>
(hence stripping out all the extra attributes). Re a regex generating tool, I don't know of any that generate regex. I think your best bet is to learn regex yourself and then use an interactive tester to quickly develop it.
I highly recommend regexr.com.
If you follow that link you'll see exactly the regex I entered in and some test data to play around with it.
Then you can play around wth the regex and see the results in real-time -- it's very helpful for fast development of regexes.
(Although, regexr.com does not offer the ungreedy 'U' tag; just convert all +
to +?
and *
to *?
in the regex to simulate this).
Upvotes: 2
Reputation: 22241
Check out http://gskinner.com/RegExr/.
I love that thing.
It will teach you how to construct your own patterns.
Regex (regular expressions) is an invaluable programming skill that is applicable in mane server side and client side programming languages.
Upvotes: 1
Reputation: 1087
I don't know for what exactly are you using this regex, but i thinks this should work for you:
$your_string = '<a href="http://site.com/folder/img1.jpg" name="test">';
preg_match('@<a href="(.*?)".*?>(.*<\/a>)?@', $your_string, $matches);
print_r($matches) // Array ( [0] => http://site.com/folder/img1.jpg )
Upvotes: 1