Jaco Pretorius
Jaco Pretorius

Reputation: 24840

Regex for matching a character, but not when it's enclosed in quotes

I need to match a colon (':') in a string, but not when it's enclosed by quotes - either a " or ' character.

So the following should have 2 matches

something:'firstValue':'secondValue'
something:"firstValue":'secondValue'

but this should only have 1 match

something:'no:match'

Upvotes: 17

Views: 12151

Answers (5)

Radon8472
Radon8472

Reputation: 4941

You can try to catch the strings withing the quotes

/(?<q>'|")([\w ]+)(\k<q>)/m

First pattern defines the allowed quote types, second pattern takes all Word-Digits and spaces. Very good on this solution is, it takes ONLY Strings where opening and closing quotes match.

Try it at regex101.com

Upvotes: 0

Dave Sherohman
Dave Sherohman

Reputation: 46187

Regular expressions are stateless. Tracking whether you are inside of quotes or not is state information. It is, therefore, impossible to handle this correctly using only a single regular expression. (Note that some "regular expression" implementations add extensions which may make this possible; I'm talking solely about "true" regular expressions here.)

Doing it with two regular expressions is possible, though, provided that you're willing to modify the original string or to work with a copy of it. In Perl:

$string =~ s/['"][^'"]*['"]//g;
my $match_count = $string =~ /:/g;

The first will find every sequence consisting of a quote, followed by any number of non-quote characters, and terminated by a second quote, and remove all such sequences from the string. This will eliminate any colons which are within quotes. (something:"firstValue":'secondValue' becomes something:: and something:'no:match' becomes something:)

The second does a simple count of the remaining colons, which will be those that weren't within quotes to start with.

Just counting the non-quoted colons doesn't seem like a particularly useful thing to do in most cases, though, so I suspect that your real goal is to split the string up into fields with colons as the field delimiter, in which case this regex-based solution is unsuitable, as it will destroy any data in quoted fields. In that case, you need to use a real parser (most CSV parsers allow you to specify the delimiter and would be ideal for this) or, in the worst case, walk through the string character-by-character and split it manually.

If you tell us the language you're using, I'm sure somebody could suggest a good parser library for that language.

Upvotes: 3

heijp06
heijp06

Reputation: 11788

I've come up with the following slightly worrying construction:

(?<=^('[^']*')*("[^"]*")*[^'"]*):

It uses a lookbehind assertion to make sure you match an even number of quotes from the beginning of the line to the current colon. It allows for embedding a single quote inside double quotes and vice versa. As in:

'a":b':c::"':" (matches at positions 6, 8 and 9)

EDIT

Gumbo is right, using * within a look behind assertion is not allowed.

Upvotes: 0

Gumbo
Gumbo

Reputation: 655219

If the regular expression implementation supports look-around assertions, try this:

:(?:(?<=["']:)|(?=["']))

This will match any colon that is either preceeded or followed by a double or single quote. So that does only consider construct like you mentioned. something:firstValue would not be matched.

It would be better if you build a little parser that reads the input byte-by-byte and remembers when quotation is open.

Upvotes: 7

Daniel Br&#252;ckner
Daniel Br&#252;ckner

Reputation: 59645

Uppps ... missed the point. Forget the rest. It's quite hard to do this because regex is not good at counting balanced characters (but the .NET implementation for example has an extension that can do it, but it's a bit complicated).

You can use negated character groups to do this.

[^'"]:[^'"]

You can further wrap the quotes in non-capturing groups.

(?:[^'"]):(?:[^'"])

Or you can use assertion.

(?<!['"]):(?!['"])

Upvotes: 1

Related Questions