Lee Tickett
Lee Tickett

Reputation: 6027

Non Greedy Regex Start With But Doesn't Contain String

I'm trying to extract certain URLs from HTML (for example, all that begin with http, contain /tempfiles/ and end in .jpg). I have something like;

http.*?\/tempfiles\/.*?\.jpg

The problem is when I have HTML like;

blah blah <img src=http://somelink/file.html>http://server/tempfiles/blah.jpg
blah blah

It returns http://somelink/file.html etc more junk http://server/tempfiles/blah.jpg

Is there a way to say there must not be a second http between the first and the /tempfiles/?

Upvotes: 2

Views: 82

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You may use

http(?:(?!http).)*?/tempfiles/.*?\.jpg

See the regex demo and a Regulex graph:

enter image description here

Details

  • http - a http substring
  • (?:(?!http).)*? - any char other than a newline char, 0 or more repetitions, as few as possible, that does not start a http char sequence
  • /tempfiles/ - a literal substring
  • .*? - any 0+ chars other than newline, as few as possible
  • \.jpg - a .jpg substring.

Upvotes: 2

Related Questions