Reputation: 1745
I'm looking for a regular expression that allows for either single-quoted or double-quoted strings, and allows the opposite quote character within the string. For example, the following would both be legal strings: "hello 'there' world" 'hello "there" world'
The regexp I'm using uses negative lookahead and is as follows:
(['"])(?:(?!\1).)*\1
This would work I think, but what about if the language didn't support negative lookahead. Is there any other way to do this? Without alternation?
EDIT:
I know I can use alternation. This was more of just a hypothetical question. Say I had 20 different characters in the initial character class. I wouldn't want to write out 20 different alternations. I'm trying to actually negate the captured character, without using lookahead, lookbehind, or alternation.
Upvotes: 1
Views: 192
Reputation: 42020
In the general case, regexps are not really the answer. You might be interested in something like Text::ParseWords, which tokenizes text, accounting for nested quotes, backslashed quotes, backslashed spaces, and other oddities.
Upvotes: 1
Reputation: 18335
This is actually much simpler than you may have realized. You don't really need the negative look-ahead. What you want to do is a non-greedy (or lazy) match like this:
(['"]).*?\1
The ?
character after the .*
is the important part. It says, consume the minimum possible characters before hitting the next part of the regex. So, you get either kind of quote, and then you go after 0-M characters until you encounter a character matching whichever quote you first ran into. You can learn more about greedy matching vs. non-greedy here and here.
Upvotes: 7
Reputation: 29772
Sure:
'([^']*)'|"([^"]*)"
On a successful match, the $+
variable will hold the contents of whichever alternate matched.
Upvotes: 1