Reputation: 5145
I have this code running inside a buffer (used to unescape a JS string in Ruby):
elsif hex_substring =~ /^\\u[0-9a-fA-F]{1,4}/
hex_substring.scan(/^((\\u[\da-fA-F]{4}){1,})/) do |match|
hex_byte = match[0]
buffer << JSON.load(%Q("#{hex_byte}"))
hex_index += hex_byte.length
end
...
I have a concern that the scan()
is matching a bit too much:
hex_substring.scan(/^((\\u[\da-fA-F]{4}){1,})/)
# => [["\\ud83c\\udfec", "\\udfec"]]
I am using only "\\ud83c\\udfec"
, not "\\udfec"
.
Is there a way in Ruby or in regex to grab only the first part?
Upvotes: 2
Views: 115
Reputation: 626794
You should use a single grouping construct here, the one to match 1 or more occurrences of four hex chars, and omit the inner capturing group that resulted in an extra item in the resulting array:
.scan(/^(?:\\u[\da-fA-F]{4})+/)
Note that +
is a simpler and shorter way to write {1,}
(one or more occurrences).
Details
^
- start of string(?:
- start of a non-capturing group (what it matches won't be added to the final scan
result):
\\u
- a \u
substring[\da-fA-F]{4}
- four hex chars)+
- 1 or more occurrences (of the group pattern sequence).Upvotes: 3