Alfred Francis
Alfred Francis

Reputation: 451

Extracting specific links using PHP preg_match_all

I have an html file containing

 <img width="10" height="12" scr="https://www.site.com/yughggcfgh">
<img width="11" height="15" scr="https://www.site.com/yughggcfghcvbcvb">
<img width="10" height="12" scr="https://www.site.com/a.jpg">
<img width="10" height="12" scr="https://www.site.com/b.gif">

I want to extract the paths of images which doesn't have an extention in an array,
The output must be as follows

ari[1]= <img width="10" height="12" scr="https://www.site.com/yughggcfgh">
ari[2]= <img width="11" height="15" scr="https://www.site.com/yughggcfghcvbcvb"> 

Upvotes: 0

Views: 660

Answers (2)

Lawrence Cherone
Lawrence Cherone

Reputation: 46660

You really should use domDocument or some html parser not regex heres an example:

<?php 
$somesource='<img width="10" height="12" src="https://www.site.com/yughggcfgh">
<img width="11" height="15" src="https://www.site.com/yughggcfghcvbcvb">
<img width="10" height="12" src="https://www.site.com/a.jpg">
<img width="10" height="12" src="https://www.site.com/b.gif">';

$xml = new DOMDocument();
@$xml->loadHTML($somesource);
foreach($xml->getElementsByTagName('img') as $img) {
    if(substr($img->getAttribute('src'),-4,1)!='.'){
        $image[] = $img->getAttribute('src');
    }
}

print_r($image);

Array
(
    [0] => https://www.site.com/yughggcfgh
    [1] => https://www.site.com/yughggcfghcvbcvb
)

?>

Upvotes: 2

knittl
knittl

Reputation: 265966

Regular expressions are probably not the right tool for the job, but here you go …

You should be able to achieve your goal with negative lookbehind assertions:

preg_match_all('/src=".+?(?<!\.jpg|\.jpeg|\.gif|\.png)"/', $html, $matches);

Upvotes: 1

Related Questions