LanderTaker
LanderTaker

Reputation: 55

Preg_match_all not giving the same results as preg_match

I have been trying to get all file resources inside an HTML.

My current version of the regex is

"[^']*'([^"]*)'[^']*"   |   "([^"]*)"

An example HTML (only a part):

<div style="background-image: url('/courses/UMASGRUPOBDEMO/document/learning_path/El_Contrato_de_Seguro-_Contenido_Teorico/video_pres_cto_seguro.jpg');display: block; margin-left: auto; margin-right: auto;"></div>

                        <img class="maximize"
 src="/courses/CURSODESTINOPEQUENO/document/learning_path/LECCION_1_2_3_4_5_-_corta/Diapositiva01-29332.jpg" style="display: block; margin-left: auto; margin-right: auto;" />

Iterating preg_match I can get:

But preg_match_all only give me the next one:

You can live test it at http://www.phpliveregex.com/p/h6T

Does this have any sense? Probably my regex needs something to work.

I have not much experience with regex. Please help me :)

Thanks you in advance!

Added:

The regex actually is something like:

As I am seeing, maybe the no quotes and no double quotes conditions should be touched a little to have better regex...

Now using a longer HTML example: http://www.phpliveregex.com/p/h74

<p><img class="maximize" src="/courses/UMASGRUPOBDEMO/document/learning_path/Diapositiva54/Diapositiva2.jpg" style="display: block; margin-left: auto; margin-right: auto;" alt="" /></p>

<div style="background-image: url('/courses/UMASGRUPOBDEMO/document/learning_path/El_Contrato_de_Seguro-_Contenido_Teorico/video_pres_cto_seguro.jpg');display: block; margin-left: auto; margin-right: auto;"></div>

                        <img class="maximize"
 src="/courses/CURSODESTINOPEQUENO/document/learning_path/LECCION_1_2_3_4_5_-_corta/Diapositiva01-29332.jpg" style="display: block; margin-left: auto; margin-right: auto;" />

Upvotes: 0

Views: 115

Answers (1)

J Earls
J Earls

Reputation: 1812

Try this regex instead:

"[^"']*'([^"']*)'[^"']*"|"([^"]*)"

Your original regex was greedily picking up everything from after the second ' to the last " in the input.

Remember that the * and + operators in regex are greedy meaning they will consume as much as possible in order to match.

You either must limit what those operators are applied to (as I did above) or turn them into non-greedy operators for the regex systems that support it, by using *? or +?:

"[^']*?'[^"]*?'[^']*?"

(However, this last one will still have issues, for example with <img src="foo" alt='bar' class="myimage" /> - which will grab 'bar' even though it's not part of a "-delimited string)

Upvotes: 1

Related Questions