Sean Nilan
Sean Nilan

Reputation: 1745

Regexp Question - Negating a captured character

I'm looking for a regular expression that allows for either single-quoted or double-quoted strings, and allows the opposite quote character within the string. For example, the following would both be legal strings: "hello 'there' world" 'hello "there" world'

The regexp I'm using uses negative lookahead and is as follows:

(['"])(?:(?!\1).)*\1

This would work I think, but what about if the language didn't support negative lookahead. Is there any other way to do this? Without alternation?

EDIT:

I know I can use alternation. This was more of just a hypothetical question. Say I had 20 different characters in the initial character class. I wouldn't want to write out 20 different alternations. I'm trying to actually negate the captured character, without using lookahead, lookbehind, or alternation.

Upvotes: 1

Views: 192

Answers (3)

Ryan C. Thompson
Ryan C. Thompson

Reputation: 42020

In the general case, regexps are not really the answer. You might be interested in something like Text::ParseWords, which tokenizes text, accounting for nested quotes, backslashed quotes, backslashed spaces, and other oddities.

Upvotes: 1

mattmc3
mattmc3

Reputation: 18335

This is actually much simpler than you may have realized. You don't really need the negative look-ahead. What you want to do is a non-greedy (or lazy) match like this:

(['"]).*?\1

The ? character after the .* is the important part. It says, consume the minimum possible characters before hitting the next part of the regex. So, you get either kind of quote, and then you go after 0-M characters until you encounter a character matching whichever quote you first ran into. You can learn more about greedy matching vs. non-greedy here and here.

Upvotes: 7

Sean
Sean

Reputation: 29772

Sure:

'([^']*)'|"([^"]*)"

On a successful match, the $+ variable will hold the contents of whichever alternate matched.

Upvotes: 1

Related Questions