Reputation: 55
I have been trying to get all file resources inside an HTML.
My current version of the regex is
"[^']*'([^"]*)'[^']*" | "([^"]*)"
An example HTML (only a part):
<div style="background-image: url('/courses/UMASGRUPOBDEMO/document/learning_path/El_Contrato_de_Seguro-_Contenido_Teorico/video_pres_cto_seguro.jpg');display: block; margin-left: auto; margin-right: auto;"></div>
<img class="maximize"
src="/courses/CURSODESTINOPEQUENO/document/learning_path/LECCION_1_2_3_4_5_-_corta/Diapositiva01-29332.jpg" style="display: block; margin-left: auto; margin-right: auto;" />
Iterating preg_match I can get:
But preg_match_all only give me the next one:
You can live test it at http://www.phpliveregex.com/p/h6T
Does this have any sense? Probably my regex needs something to work.
I have not much experience with regex. Please help me :)
Thanks you in advance!
Added:
The regex actually is something like:
As I am seeing, maybe the no quotes and no double quotes conditions should be touched a little to have better regex...
Now using a longer HTML example: http://www.phpliveregex.com/p/h74
<p><img class="maximize" src="/courses/UMASGRUPOBDEMO/document/learning_path/Diapositiva54/Diapositiva2.jpg" style="display: block; margin-left: auto; margin-right: auto;" alt="" /></p>
<div style="background-image: url('/courses/UMASGRUPOBDEMO/document/learning_path/El_Contrato_de_Seguro-_Contenido_Teorico/video_pres_cto_seguro.jpg');display: block; margin-left: auto; margin-right: auto;"></div>
<img class="maximize"
src="/courses/CURSODESTINOPEQUENO/document/learning_path/LECCION_1_2_3_4_5_-_corta/Diapositiva01-29332.jpg" style="display: block; margin-left: auto; margin-right: auto;" />
Upvotes: 0
Views: 115
Reputation: 1812
Try this regex instead:
"[^"']*'([^"']*)'[^"']*"|"([^"]*)"
Your original regex was greedily picking up everything from after the second '
to the last "
in the input.
Remember that the *
and +
operators in regex are greedy meaning they will consume as much as possible in order to match.
You either must limit what those operators are applied to (as I did above) or turn them into non-greedy operators for the regex systems that support it, by using *?
or +?
:
"[^']*?'[^"]*?'[^']*?"
(However, this last one will still have issues, for example with <img src="foo" alt='bar' class="myimage" />
- which will grab 'bar'
even though it's not part of a "
-delimited string)
Upvotes: 1