VaidAbhishek
VaidAbhishek

Reputation: 6144

Regex to match a string having escaped characters

I want to write a regular expression which can match following specification for string literals. For the last 10 hours, I've gone crazy over formulating various regular expressions which none seem to work. Finally I've boiled down to this one:

Basically, requirements are following:

  1. A String literal has to be matched so I'm matching everything upto the last ", in between there could be a \", which should not end the string.
  2. We could also be able to escape anything including a \n with a '\'
  3. Only an unescaped '"' character can end the match, nothing else.

Some sample strings which I need to correctly match are following:

  1. \a\b\"\n" => I should match following character '\', 'a', '\', 'b', '\','"','\', 'n', '"'
  2. \"this is still inside the string" => should match whole text including last '"'
  3. 'm about to escape to a newline \'\n'" => There's a \n character in this string, but still the string should match everything from starting 'm' to ending '"'.

Kindly someone please help me formulate such a Regex. In my opinion that Regex I've provided should do the job, but it's rather failing for no reason.

Upvotes: 1

Views: 2273

Answers (3)

Gumbo
Gumbo

Reputation: 655259

Your regular expression is almost right, you just need to be aware that inside a character class the period . is just a literal . and not any character except newline. So:

([^"\\]|\\(.|\n))*\"

Or:

([^"\\]|\\[\s\S])*\"

Upvotes: 2

buckley
buckley

Reputation: 14089

I assumed that your string also starts with a " (Should your examples not start with it?)

The Lookaround construct seems most natural for me to use:

".*?"(?<!\\")

Given the input

"test" test2 "test \a test"  "test \"test" "test\"" 

this will match:

"test"
"test \a test"
"test \"test"
"test\""

The regex reads:

Match the character “"” literally «"»
Match any single character that is not a line break character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “"” literally «"»
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\\")»
   Match the character “\” literally «\\»
   Match the character “"” literally «"»

Upvotes: 0

MRAB
MRAB

Reputation: 20654

I think that this would be more efficient:

[^"\\]*(\\.[^"\\]*)*\"

Upvotes: 1

Related Questions