user2565010
user2565010

Reputation: 2030

Regex for capitalized backreference

I'm trying to create the regex that highlights any group of two consecutive letters where the latter is the capitalized version of the former (which is lowercase).

For example, in the string

aSsdDsaAdfF

I want dD, aA and fF to match my given regex. To put it in another way, the string with highlights shouls be

aSsdDsaAdfF

I think I need to use backreferences, but I don't know how.

Could anybody please give me a way to solve this issue?

Upvotes: 4

Views: 509

Answers (2)

user3408541
user3408541

Reputation: 63

Ahoy!

I didnt check whether the previous solution works, it looks like a nice one liner. However strangely when I looked at this problem I didnt think of regular expressions, I thought of c-style character arrays. I came up with a solution that was not a one line regular expression, but a bit longer and verbose that split the string into a character array and looked for a case insensitive match and a case sensitive fail.

It might be a little more descriptive and easier to build around.

#!/usr/bin/perl -w

#check if latter character is a capitalized version of the former

my $s = "aSsdDsaAdfF";
my @s = split(//,$s);     #split string into an old fashioned c-style character array
my @matchPos;           #array to keep position of all matches
my @matchChar;         #array to keep matched character
my $count=0;               #loop counter
my $previousLetter;     #using lookbehind

#matches must satisfy these three conditions
my $caseInsensitiveMatch;
my $caseSensitiveFail;
my $currentCharacterIsUpper;#latter has to be capitalized version of former, and not vise versa i.e. s S will match and S s will fail

print "String is \"$s\"\n";
foreach(@s){
    if($count==0){      #skip the first letter then use look behind
        $count++;
        next;
    }
    #lookbehind character
    $previousLetter = $s[$count-1];
    
    #check if case insensitive compare matches AND case sensitive compare fails
    $caseInsensitiveMatch = $_ =~ /^$previousLetter$/i;
    $caseSensitiveFail = $_ !~ /^$previousLetter$/;
    
    #that means this is the same character, but one is upper, one is lower
    #make sure current char is upper, and lookbehind is lower
    $currentCharacterIsUpper = $_ =~ /^[A-Z]$/;

    #satisfy all three conditions this is a match
    if($caseInsensitiveMatch && $caseSensitiveFail && $currentCharacterIsUpper){ 
        print "match at position $count characters $previousLetter and $_\n";#records match at second character
        push(@matchPos, $count);
        push(@matchChar, $_);
    }
    $count++
}
print "Matches found in position: \t\t\t@matchPos\n";
print "Characters matched are as follows: \t\t@matchChar\n";

Output looks like this

$ perl consecutiveCharacters.pl
String is "aSsdDsaAdfF"
match at position 4 characters d and D
match at position 7 characters a and A
match at position 10 characters f and F
Matches found in position:               4 7 10
Characters matched are as follows:      D A F

Upvotes: 1

user557597
user557597

Reputation:

One way is this (?-i:([a-z])(?=[A-Z]))(?i:\1)
which uses entirely localized case modifiers that don't affect anything
else.

Explanation

 (?-i:                         # Cluster group with 'case sensitive' scoped modifier
      ( [a-z] )                     # (1), Lower-case
      (?= [A-Z] )                   # Lookahead, Upper-case
 )                             # End cluster
 (?i:                          # Cluster group with 'case insensitive' scoped modifier
      \1                            # Backreference to group 1
                                    # ( previous assertion guarantees this
                                    #   can only be the Upper-Cased version of group 1) 
 )                             # End cluster

Upvotes: 5

Related Questions