Reputation: 6027
I'm trying to extract certain URLs from HTML (for example, all that begin with http, contain /tempfiles/ and end in .jpg). I have something like;
http.*?\/tempfiles\/.*?\.jpg
The problem is when I have HTML like;
blah blah <img src=http://somelink/file.html>http://server/tempfiles/blah.jpg
blah blah
It returns http://somelink/file.html etc
more junk http://server/tempfiles/blah.jpg
Is there a way to say there must not be a second http between the first and the /tempfiles/?
Upvotes: 2
Views: 82
Reputation: 626794
You may use
http(?:(?!http).)*?/tempfiles/.*?\.jpg
See the regex demo and a Regulex graph:
Details
http
- a http
substring(?:(?!http).)*?
- any char other than a newline char, 0 or more repetitions, as few as possible, that does not start a http
char sequence/tempfiles/
- a literal substring.*?
- any 0+ chars other than newline, as few as possible\.jpg
- a .jpg
substring.Upvotes: 2