Hartator
Hartator

Reputation: 5145

Ruby regex avoid matching a group

I have this code running inside a buffer (used to unescape a JS string in Ruby):

  elsif hex_substring =~ /^\\u[0-9a-fA-F]{1,4}/
    hex_substring.scan(/^((\\u[\da-fA-F]{4}){1,})/) do |match|
      hex_byte = match[0]
      buffer    << JSON.load(%Q("#{hex_byte}"))
      hex_index += hex_byte.length
    end
  ...

I have a concern that the scan() is matching a bit too much:

hex_substring.scan(/^((\\u[\da-fA-F]{4}){1,})/)
# => [["\\ud83c\\udfec", "\\udfec"]]

I am using only "\\ud83c\\udfec", not "\\udfec".

Is there a way in Ruby or in regex to grab only the first part?

Upvotes: 2

Views: 115

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You should use a single grouping construct here, the one to match 1 or more occurrences of four hex chars, and omit the inner capturing group that resulted in an extra item in the resulting array:

.scan(/^(?:\\u[\da-fA-F]{4})+/)

Note that + is a simpler and shorter way to write {1,} (one or more occurrences).

Details

  • ^ - start of string
  • (?: - start of a non-capturing group (what it matches won't be added to the final scan result):
    • \\u - a \u substring
    • [\da-fA-F]{4} - four hex chars
  • )+ - 1 or more occurrences (of the group pattern sequence).

Upvotes: 3

Related Questions