Hawkeye001
Hawkeye001

Reputation: 811

Ruby regex to extract string between single/double quotes that may include an escaped character

I am trying to write a regex that can pull a string value from a mysql string.

That is, if I have the following generated sql string and I want to be able to extract the first_name:

my_string = "SELECT * FROM users WHERE first_name = 'first name value'"

What I currently have appears to work for most cases:

result = /first_name = ['"](.*?)['"]/i.match my_string

However, the issue is when there is either a ' or " in the first_name, i.e.

result = "SELECT * FROM users WHERE first_name = 'first\"s name value'"
or
result = "SELECT * FROM users WHERE first_name = 'first\\'s name value'"

the returned result is only the value UP to the escaped character, so in these cases, the returned group would be "first". How can I fix it so that the entire first_name value gets returned?

Upvotes: 1

Views: 1576

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627535

Ruby 1.9+ Solution: Identically Named Named Groups

You seem to need to match strings inside single or double quotes and only match between the matching quotes.

Use the Ruby regex feature to use multiple named groups with the same name:

/first_name = (?:'(?<val>[^'\\]*(?:\\.[^'\\]*)*)'|"(?<val>[^"\\]*(?:\\.[^"\\]*)*"))/i

See the Rubular demo

The value in-between the quotes will be inside "val" group.

Here is an IDEONE Ruby demo:

my_string = "SELECT * FROM users WHERE first_name = 'first name value'"
my_string2 = "SELECT * FROM users WHERE first_name = 'first\"s name value'"
my_string3 = "SELECT * FROM users WHERE first_name = 'first\\'s name value'"

rx = /first_name = (?:'(?<val>[^'\\]*(?:\\.[^'\\]*)*)'|"(?<val>[^"\\]*(?:\\.[^"\\]*)*"))/i

puts rx.match my_string  # => first_name = 'first name value'
puts rx.match my_string2 # => first_name = 'first"s name value'
puts rx.match my_string3 # => first_name = 'first\'s name value'

To get the "val" (demo):

rx.match(my_string)["val"] # => first name value

Ruby 1.8 Solution

Since named groups were introduced since Ruby 1.9 and you need it to work in Ruby 1.8, use a character class restricted with a negative lookahead solution.

/first_name = (['"])((?:(?!\1)[^\\])*(?:\\.(?:(?!\1)[^\\])*)*)\1/i

See the Rubular demo

The (['"]) matches and captures into Group 1 a ' or ". The (?:(?!\1)[^\\])* matches 0+ characters other than \ (due to [^\\]) and that is not " or ' (due to (?!\1)). The (?:\\.(?:(?!\1)[^\\])*)*) matches 0+ sequences of an escape sequences (see \\.) that is followed with 0+ characters other than ', " or \. The \1 backreference matches the corresponding closing quote.

See another Ruby demo:

my_string = "SELECT * FROM users WHERE first_name = 'first name value'"
my_string2 = "SELECT * FROM users WHERE first_name = 'first\"s name value'"
my_string3 = "SELECT * FROM users WHERE first_name = 'first\\'s name value'"

rx = /first_name = (['"])((?:(?!\1)[^\\])*(?:\\.(?:(?!\1)[^\\])*)*)\1/i

puts rx.match my_string      # => first_name = 'first name value'
puts rx.match(my_string)[2]  # => first name value
puts rx.match my_string2     # => first_name = 'first"s name value'
puts rx.match(my_string2)[2] # => first"s name value
puts rx.match my_string3     # => first_name = 'first\'s name value'
puts rx.match(my_string3)[2] # => first\'s name value

Upvotes: 2

amalrik maia
amalrik maia

Reputation: 468

You could try this

/first_name = ['"](.*?)['"]\z/i

example here

Upvotes: 1

gr1zzly be4r
gr1zzly be4r

Reputation: 2162

I tested this out on Rubular and it seems to get the value that you're looking for. The only thing is that it also captures your escape chars which you could replace:

f_name_match = /first_name = \'(.+)\'/i.match(string).replace('\')

Upvotes: 0

born4new
born4new

Reputation: 1687

I believe this regexp would fix it:

/first_name = ['"]((.*?)['"])*/i

Live example here.

Upvotes: 0

Related Questions