wyc
wyc

Reputation: 55273

How to only catch what's between quotation marks in the following regex?

The following regex matches substrings inside quotation marks:

^("[^"]*")$

"Dialogue," she said. "More dialogue."

I don't want to catch the quotation marks (only what's inside the quotation marks). So I figured I should use a lookahead and a lookbehind:

^((?<=")[^"]*(?="))$

But now the regex isn't matching anything.

Why is this? And how to fix it?

https://regexr.com/5spdt

EDIT: Removing the outer capture group kind of worked, but now she said is being caputerd too. (?<=")[^"]*(?=")

Upvotes: 1

Views: 653

Answers (2)

The fourth bird
The fourth bird

Reputation: 163277

You get too much matches, as the assertions to not match the " so anything between 2 double quotes is a match.

You can assert a " to the left, the match all except " until you can assert a " to the right followed by optional pairs of "" till the end of the string.

Assuming no escaped double quotes between the double quotes

 (?<=")[^"]*(?="(?:[^"]*"[^"]*")*[^"]*$)
  • (?<=") Positive lookbehind, assert " directly to the left of the current position
  • [^"]* Match 0+ times any char except "
  • (?= Positive lookahead, assert to the right
    • " Match closing "
    • (?:[^"]*"[^"]*")* Match optional pairs of ""
    • [^"]*$ Match option char other than " and assert end of string
  • ) Close lookahead

Regex demo

Upvotes: 1

AD7six
AD7six

Reputation: 66188

KISS

The regex in the question is overly specific (exploded):

^        # Start of string
(        # Begin capturing group
"
[^"]*
"
)        # End capturing group
$        # End of string

This will only match strings of the form:

"some string"

It would not, for example, match strings of the form:

anything "some string"   (does not start with a quote
"some string" anything   (does not end with a quote)

So given the goal is to capture quoted strings, just don't include the quotes in the capturing group:

"([^"]*)"

And then reference the capturing group, not the whole matching string.

Applied to Javascript

Consider the following code:

input = '"one" something "two" something "three" etc.';
regex = /"([^"]*)"/;
match = input.match(regex);

Match contains: ["\"one\"", "one"] - the 0 entry is the full matching string, the 1 entry is the first capturing group. Adapt js code as relevant.

Upvotes: 1

Related Questions