Ax.
Ax.

Reputation: 320

Why is this regular expression not working?

Content of 1.txt:

Image" href="images/product_images/original_images/9961_1.jpg" rel="disable-zoom:false; disable-expand: false"><img src="im

Code that does not work:

<?php
$pattern = '/(images\/product_images\/original_images\/)(.*)(\.jpg)/i';
$result = file_get_contents("1.txt");
preg_match($pattern,$result,$match);

echo "<h3>Preg_match Pattern test:</h3><br><br><pre>";
print_r($match);
echo "</pre>";
?>

I expect this result:

Array
(
    [0] => images/product_images/original_images/9961_1.jpg
    [1] => images/product_images/original_images/
    [2] => 9961_1
    [3] => .jpg
)

But i take this-like:

Array
(
    [0] => images/product_images/original_images/9961_1.jpg" rel="disable-zoom:false; disable-expand: false"> 
    [1] => images/product_images/original_images/
    [2] => 9961_1.jpg" rel="disable-zoom:false; disable-expand: false"> 
)

I'n tired of trying from a million combinations of this regexp. I dunno what's wrong. Please and thanks a lot!

Upvotes: 0

Views: 232

Answers (4)

Randal Schwartz
Randal Schwartz

Reputation: 44056

Do not parse HTML with regex.

Do not parse HTML with regex.

Do not parse HTML with regex.

Upvotes: -1

StackOverflowNewbie
StackOverflowNewbie

Reputation: 40633

Here's the basic regex:

href="((.*/)(.*?)(.jpg))"

Upvotes: 0

Jason McCreary
Jason McCreary

Reputation: 72971

Remember that Regular Expressions are greedy. Your second capture (.*) says to match any character except the new line (unless in mutliline mode). So it is probably capturing the rest of the line.

You can make it ungreedy as suggested by Wrikken. But I like to ensure I am capturing what I want. In your case, it looks like the value of the href attribute. So really I want at least 1 character, can't be a quote, followed by the jpg extension:

$pattern = '/(images\/product_images\/original_images\/)([^'"]+)(\.jpg)/i';

Upvotes: 2

Wrikken
Wrikken

Reputation: 70460

Make it ungreedy:

$pattern = '/(images\/product_images\/original_images\/)(.*?)(\.jpg)/i';

Upvotes: 4

Related Questions