Terence Chow
Terence Chow

Reputation: 11153

regex multiple matches with OR look behind

I have the following string:

'/photos/full/1/454/6454.jpg?20140521103415','/photos/full/2/452/54_2.jpg?20140521104743','/photos/full/3/254/C2454_3.jpg?20140521104744'

What I want to parse is the address from / to the ? but I can't seem to figure it out.

So far I have /(?<=')[^?]*/ which will properly get the first link, but the second and third link will start with ,'/photos/full/... <--notice that it starts with a ,'

If I then try /(?<=',')[^?]*/ I get the second and third link but miss the first link.

Rather than do 2 regexes, is there a way I can combine them to do 1? I've tried using `/((?<=')|(?<=',')[^?]*/ to no avail.

My code is of the form matches = string.scan(regex) and then I run a match.each block...

Upvotes: 1

Views: 259

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110675

One can simply use a positive lookahead and non-greedy operator, and this of course is not limited to v2.0:

str.scan(/(?<=')\/.*?(?=\?)/)
  #=> ["/photos/full/1/454/6454.jpg",
  #    "/photos/full/2/452/54_2.jpg",
  #    "/photos/full/3/254/C2454_3.jpg"]

Edit: I added a positive lookbehined for the single quote. See comments.

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use this:

(?<=,|^)'\K[^?]+

Where (?<=,|^) checks that the quote is preceded with a comma or the start of the string/line. And where \K removes all on the left (the comma here) from the match result.

or more simple:

[^?']+(?=\?)

all that is not a quote or a question mark followed by a question mark.

Upvotes: 3

zx81
zx81

Reputation: 41838

In Ruby 2, which has \K, you can use this simple regex (see demo):

'\K/[^?]+

To see all the matches:

regex = /'\K\/[^?]+/
subject.scan(regex) {|result|
# inspect result
}

Explain Regex

'                        # '\''
\K                       # 'Keep Out!' abandons what we have matched so far
\/                       # '/'
[^?]+                    # any character except: '?' (1 or more times
                         # (matching the most amount possible))

Upvotes: 3

Related Questions