NotAgain
NotAgain

Reputation: 1977

Regex for selecting the double quotes inside the curly braces while ignoring the ones outside

I asked a question yesterday on the same which was not detailed enough. Also the suggestion provided looked like solving my problem. But there are edge cases. So I am reposting. But this time with more details.

Here is the string:

"2019/03/19","LegacyApp","{""Id"":""345-dg8"",{""Hello"",""This""},""Fake"":""Sym""}","","","(null)","",

And I want to match the quotes as highlighted below.

To match

The regex I have got till now is (?:[^{]+):(.*)$ But it is selecting till the end. And in two groups.

Update: Now I am able to select the part between the curly braces. {(?:\n|.)*}. Somehow need to match the double quotes in that selection.

Update: This is working but I am not sure about performance of this approach.

""(?=[a-zA-Z0-9])|""(?=})|""(?=:)|(?<=[a-zA-Z0-9])""

Specially since this regex will be running against each one of the million logs getting ingested.

Note: I am trying to run it in Elasticsearch. As per them, the Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators.

Upvotes: 0

Views: 534

Answers (1)

Andrew Kang
Andrew Kang

Reputation: 328

There are two ways to do that.

I would like you to use 'offsets' that every regex library in languages provides. It lets you know locations of matched texts.

First, use this regex to find out locations where brackets are.

{.+}

Let's say that the offsets of the result is from 3 to 21.

And then, use this simple regex.

""

The offsets of the result return an array like ([5,6], [12,13]...)

Finally, use 'for' clause to pick out double quotations inside brackets.

The other method is the following regex.

(?<={|{[^}]|{[^}][^}]|{[^}][^}][^}]|{[^}][^}][^}][^}]|{[^}][^}][^}][^}]|{[^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}]|{[^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}][^}])""|""(?=[^{]*})

enter image description here

Upvotes: 0

Related Questions