Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121000

Ruby regex ‘backslash R’ aka ‘\R’ pattern

I am pretty sure I have seen \R was introduced in Ruby2 to match newlines, despite where they came from: unix \n, macos \r or windows \r\n somewhere. That said, Ruby2 should treat \R like %r{\r\n|\r|\n}.

This works fine:

▶ "a\nb".match /\R/
#⇒ #<MatchData "\n">
▶ "a\rb".match /\R/
#⇒ #<MatchData "\r">
▶ "a\r\nb".match /\R/
#⇒ #<MatchData "\r\n">

even whether line endings/feeds are combined:

▶ "a\r\n\nb".match /\R{2}/
#⇒ #<MatchData "\r\n\n">

unless one tries to negate \R:

▶ "a\nb".match /[^\R]+/
#⇒ #<MatchData "a\nb">

Negating \n works fine though:

▶ "a\nb".match /[^\n]+/
#⇒ #<MatchData "a">

Unfortunately, \R is enormously hard to google. Neither Regexp rdoc nor Regular Expressions have a mention of it.

Would any regex guru drop an explanation here, so that it was at least easily googled?

Thanks in advance.

Upvotes: 6

Views: 594

Answers (1)

sawa
sawa

Reputation: 168081

This is from the author: https://github.com/k-takata/Onigmo/blob/master/doc/RE#L101. It says

\R       Linebreak

         Unicode:
           (?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])

         Not Unicode:
           (?>\x0D\x0A|[\x0A-\x0D])

What seems relevant here to your question is that it is not a character group, but is a list of alternatives. Given that the sequence is not necessarily a single character, I guess it could not be made into a character group. This is probably interacting in peculiar way with negation, which is intended to be used only with characters and/or character groups.

Upvotes: 5

Related Questions