Dan
Dan

Reputation: 51

Regular expression to find one character that is preceeded by an even number of the same character

I am trying to match an input string which might have single quote character ('). My challenge is, I need to ignore any even number of the quote characters that preceed the target character, since they are considered the escaping characters.

The following is what I have come up.

(?=('')*)'

However this doesn't work for the purpose yet. For instance, if I have an input of ''', the regular expression will match all three single quote characters, instead of just the last one.

Here are some samples.

'             ## match
''            ## no-match
'''           ## matches the last quote character
''''          ## no-match
abc'          ## matches the last quote character
Mike''s home' ## matches the last quote character only

Any help would be greatly appreciated. Thanks!

Upvotes: 2

Views: 168

Answers (5)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

With .net you can use a variable length lookbehind:

To find the last quote preceded by an even number of quotes in general:

(?<=^(?:[^']*'[^']*')*[^']*)'(?=[^']*$)

(you only need to anchor the subpattern inside the lookbehind at the begining of the string and to check there is no more quotes until the end with a lookahead.)

For the particular case where you only need to match an unescaped quote, you can simply use:

(?<=(?<!')(?:'')*)'(?!')

(In this case, no need to "count" from the start of the string and to check the string until the end, you only need to check contiguous characters.)

or the same without nested lookbehinds:

(?<=(?:^|[^'])(?:'')*)'(?!')

Upvotes: 1

DavidRR
DavidRR

Reputation: 19397

Basically, it appears that you want to detect an input that contains at least one sequence of an odd number of single-quote characters.

Here is a regex that I believe will satisfy this requirement:

(^'|[^']')('')*([^']|$)

Or, the equivalent that simply adds ?: to suppress the capture groups:

(?:^'|[^']')(?:'')*(?:[^']|$)

I have written a Perl program to test this regex against the sample data you provided. (And I added some additional sample inputs as well.) Please see the following for the expected output from the program as well as the program itself.

Expected Output:

* [']
* [x']
  [x'']
* [x''']
  ['']
* [''x']
  [''x'']
  [''x''y]
* [''']
  ['''']
  [''''x]
* [abc ']
* [Mike''s home']
  [Mike''s home'']
* [Mike''s home''']
* [Mike''s home'''x]
  [Mike''s home'''']
  [Mike''s home''''x]

Perl Program to Demonstrate RegEx:

#/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    chomp;
    my $match = "  ";

    if (/(^'|[^']')('')*([^']|$)/) {

#         ^^ ^^^^^ ^^^^^ ^^^^ ^
#        (1a) (1b)  (2)   3a  3b
#
# Match the line if:
# (1a) The line begins with a single quote character
#      -or-
# (1b) Somewhere contains a non-quote character followed by a single
#      quote character
# (2)  That is optionally followed by an even number of quote characters.
# (3a) And that is followed by a non-quote character
#      -or-
# (3b) The end of the line.

        $match = "* "
    }

    print "$match\[$_\]\n";
}

__END__
'
x'
x''
x'''
''
''x'
''x''
''x''y
'''
''''
''''x
abc '
Mike''s home'
Mike''s home''
Mike''s home'''
Mike''s home'''x
Mike''s home''''
Mike''s home''''x

Upvotes: 1

revo
revo

Reputation: 48711

I don't know what environment you use for testing Regular Expressions however below regex is PCRE compatible which works per as your given examples:

(?<!')(?:'')*\K'(?!')

Live demo

Upvotes: 2

garyh
garyh

Reputation: 2852

(?:'')*(.+)

The first bit (?:'')* is a non-capturing group with the second set of parentheses returning the match

See demo here

Upvotes: 0

Mike Perrenoud
Mike Perrenoud

Reputation: 67898

You'll need to leverage negative look-ahead and look-behind, but bear in mind they don't work the same in all implementations (and I honestly don't know the details, I just know that's true):

(?<!')'(?!')

Regular expression visualization

Debuggex Demo

Upvotes: 0

Related Questions