Kyle Hobbs
Kyle Hobbs

Reputation: 457

Using Regex/Replace to replace escaped quotes with quotes, unless they would be in a string after the replace

I am receiving JSON from an API. Unfortunately, all the nested objects are returned as strings. I am trying to use .replace() to convert the string to a JSON object. So "{ -> {, }" -> } and \" -> ".

Original issue is when a value actually needs to have escaped quotes they are replaced by regular quotes causing JSON syntax error. I almost have it worked it out, albeit in a seemingly clumsy way, by just using some things I know will remain consistent about the data. Here is how I have it working now:

Example of a value which should be a nested object but is actually a string (Ignore all newlines just adding for better readability):

"title": "{
    \"en\": \"Who is Dwayne \"the rock\" Johnson?\",
    \"es\": \"Quien es Dwayne \"la roca\" Johnson?\"
}"

My replace calls:

.replace(/"\s*{/gim, '{') // removes quote from opening bracket
.replace(/\\"en\\"/gim, '"en"') // removes escape character from "en"
.replace(/\\"es\\"/gim, '"es"') // removes escape character from "es"
.replace(/:\s*\\"/gim, ':"') // removes escape character from " following :
.replace(/\\"\s*,/gim, '",') // removes escape character from " preceding ,
.replace(/\\"\s*}\s*"/gim, '"}') // removes escape character for quote preceding } and removes quote from closing bracket

For that example this works outputting the desired:

"title": {
    "en": "Who is Dwayne \"the rock\" Johnson?",
    "es": "Quien es Dwayne \"la roca\" Johnson?"
}

However, my solution relies on the colons and commas to know which escaped quotes should be replaced with regular quotes. If the actual content were to have an escaped quote followed by a comma for example, this would break:

"title": "{
    \"en\": \"Who is Dwayne \"the rock\", Johnson?\",
    \"es\":\"Quien es Dwayne \"la roca\", Johnson?\"
}"

Same if an escaped quote in the actual content were preceded by a colon. I've tested around with lookahead and lookbehind but I'm not sure that will work as the characters around needed/unneeded escapes are pretty much the same.

Is this possible to do with regex and replace expressions and if so how?

Upvotes: 0

Views: 75

Answers (1)

Peter Thoeny
Peter Thoeny

Reputation: 7616

I think your nested JSON example has an escape issue, it should be:

"title": "{
  \"en\": \"Who is Dwayne \\\"the rock\\\" Johnson?\",
  \"es\": \"Quien es Dwayne \\\"la roca\\\" Johnson?\"
}"

Ignore newlines, added for clarity. So the title value is a nested JSON string, where:

  • " is escaped as \"
  • \ is escaped as \\
  • \" is escaped as \\\" (e.g. \\ and \")

With this you can parse the nested JSON like this, e.g. no need to use regex:

let jsonString = `{
  "someKey": "someValue",
  "someNum": 42,
  "title": "{\\"en\\": \\"Who is Dwayne \\\\\\"the rock\\\\\\" Johnson?\\",\\"es\\": \\"Quien es Dwayne \\\\\\"la roca\\\\\\" Johnson?\\"}"
}`;
console.log('jsonString:', jsonString);
let obj = JSON.parse(jsonString);
console.log('obj:', obj);
let titleObj = JSON.parse(obj.title);
console.log('titleObj:', titleObj);
console.log('titleObj.en value:', titleObj.en);

Output:

jsonString: {
  "someKey": "someValue",
  "someNum": 42,
  "title": "{\"en\": \"Who is Dwayne \\\"the rock\\\" Johnson?\",\"es\": \"Quien es Dwayne \\\"la roca\\\" Johnson?\"}"
}
obj: {
  "someKey": "someValue",
  "someNum": 42,
  "title": "{\"en\": \"Who is Dwayne \\\"the rock\\\" Johnson?\",\"es\": \"Quien es Dwayne \\\"la roca\\\" Johnson?\"}"
}
titleObj: {
  "en": "Who is Dwayne \"the rock\" Johnson?",
  "es": "Quien es Dwayne \"la roca\" Johnson?"
}
titleObj.en value: Who is Dwayne "the rock" Johnson?

Upvotes: 1

Related Questions