Ruby regex avoid matching a group

Question

I have this code running inside a buffer (used to unescape a JS string in Ruby):

  elsif hex_substring =~ /^\u[0-9a-fA-F]{1,4}/
    hex_substring.scan(/^((\u[\da-fA-F]{4}){1,})/) do |match|
      hex_byte = match[0]
      buffer    << JSON.load(%Q("#{hex_byte}"))
      hex_index += hex_byte.length
    end
  ...

I have a concern that the scan() is matching a bit too much:

hex_substring.scan(/^((\u[\da-fA-F]{4}){1,})/)
# => [["\ud83c\udfec", "\udfec"]]

I am using only "\ud83c\udfec", not "\udfec".

Is there a way in Ruby or in regex to grab only the first part?

Wiktor Stribiżew · Accepted Answer

You should use a single grouping construct here, the one to match 1 or more occurrences of four hex chars, and omit the inner capturing group that resulted in an extra item in the resulting array:

.scan(/^(?:\u[\da-fA-F]{4})+/)

Note that + is a simpler and shorter way to write {1,} (one or more occurrences).

Details

^ - start of string
(?: - start of a non-capturing group (what it matches won't be added to the final scan result):
- \u - a \u substring
- [\da-fA-F]{4} - four hex chars
)+ - 1 or more occurrences (of the group pattern sequence).

Ruby regex avoid matching a group

Answers (1)

Related Questions