entenbein
entenbein

Reputation: 25

Regex - Finding two consecutive lines only differing in case

I got some input that looks like this:

Ababa
ababa
bebebe
cacaca
Dododo
dododo

How can I find the consecutive (just two consecutive ones) lines which are basically the same but just differ in the case (of the first character). For this example [Aa]baba & [Dd]ododo.

I guess it might depend on the kind of editor I use and which kind of regex it works with (I tried starting with Sublime text, case-sensitive of course):

^([A-Z])([a-z]+)\n\l\1\2

\l\1 works for replacing group 1 with an initial lower case character (at least in Sublime Text), but obviously not the same when searching for such a pattern.

Any suggestions?

Thanks!

Upvotes: 2

Views: 1372

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627056

It seems you are looking for an inline anywhere-inside-pattern case insensitive modifier (?i:....) around the first backreference:

^([A-Z])([a-z]+)\n(?i:\1)\2$
                  ^^^   ^

This will make the first backreference case insensitive.

To support any linebreak style, use \R instead of \n:

^([A-Z])([a-z]+)\R(?i:\1)\2$
                ^^

Boost Modifiers reference:

(?imsx-imsx ... ) alters which of the perl modifiers are in effect within the pattern, changes take effect from the point that the block is first seen and extend to any enclosing ). Letters before a - turn that perl modifier on, letters afterward, turn it off.

(?imsx-imsx:pattern) applies the specified modifiers to pattern only.

enter image description here

Pattern details:

  • ^ - start of a line (in Sublime Text and Notepad++, the MUTLILINE mode is on by default)
  • ([A-Z]) - (Group 1) first uppercase ASCII letter (replace [A-Z] with \p{Lu} to match any Unicode uppercase letter)
  • ([a-z]+) - (Group 2) 1 or more lowercase ASCII letters (replace [a-z] with \p{Ll} to match any Unicode lowercase letters)
  • \R - any linebreak (CRLF, LF, or CR)
  • (?i:\1) - a case-insensitive backreference to Group 1 value
  • \2 - case sensitive backreference to Group 2 value
  • $ - end of a line/file.

Upvotes: 8

Related Questions