Manos Nikolaidis
Manos Nikolaidis

Reputation: 22254

How can I match the lowercase version of a backreference

I'd like to match the lowercase version of an uppercase character in a backreference in a regex. For example, let's say I want to match a string where the 1st character is any uppercase character and the 4th character is the same letter as the first except it's a lowercase character. If I use grep with this regex:

grep -E "([A-Z])[a-z]{2}\1[a-z]"

it would match "EssEx" and "SusSe" for instance. I'd like to match "Essex" and "Susse" instead. Is it possible to modify the above regular expression to achieve this ?

Upvotes: 4

Views: 312

Answers (2)

Sebastian Proske
Sebastian Proske

Reputation: 8413

This is one of the cases where inline modifiers come in handy. Here is a solution that makes use of a case-senstive lookahead to check, that it is not exactly the same (uppercase) character and a case-insensitive backreference to match the fitting lowercase letter:

([A-Z])[a-z]{2}(?-i)(?!\1)(?i)\1[a-z]

Note that the (?-i) most likely isn't needed, but it's there for clarity. Inline modifiers are not supported by all regex flavours. PCRE supports it, so you will have to use -P with grep.

Upvotes: 2

anubhava
anubhava

Reputation: 785611

It will be more verbose but this awk does the job:

awk '/([A-Z])[a-z]{2}/ && tolower(substr($1, 1, 1)) == substr($1, 4, 1) && 
     substr($1, 5) ~ /[a-z]/' file

Essex
Susse

Upvotes: 2

Related Questions