geotheory
geotheory

Reputation: 23670

Escape quotes within triple-quoted strings

Where quotes withing JSON strings are not escaped but nested in triple-quotes e.g.

j0 = '[
  {
      "A" : "no quoted bits"
  },
  {
    "A" : """this contains: "quoted" bits""",
    "B" : "no quoted bits"
  },
  {
    "A" : "no quoted bits",
    "B" : """this contains: "quoted" and "more quoted" bits"""
  }
]'

reading into R will error e.g.

jsonlite::fromJSON(j0)
#> Error: parse error: after key and value, inside map, I expect ',' or '}'
#>            bits"   },   {     "A" : """this contains: "quoted" bits"""
#>                      (right here) ------^

I've cobbled together a hacky workaround

escape_triple_quoted = function(j){
  j_split = strsplit(j, '"{3}')[[1]]
  f = seq_along(j_split) %% 2 == 0  # filter
  j_split[f] = gsub('"', '\\\\"', j_split[f])
  paste(j_split, collapse = '"')
}

escape_triple_quoted(j0) |> jsonlite::fromJSON()
#>                              A                                              B
#> 1               no quoted bits                                           <NA>
#> 2 this contains: "quoted" bits                                 no quoted bits
#> 3               no quoted bits this contains: "quoted" and "more quoted" bits
# function for parsing strings where quotes are not escaped but nested inside triple-quotes

but it doesn't feel like best practice. Is there a better approach?

Upvotes: 0

Views: 275

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269852

Here is a one-liner for escape_triple_quotes using gsubfn. The gsubfn function is like gsub except the second argument may be a function which inputs the capture groups of the match and outputs the replacement of the match. It may be expressed in formula notation as we do here.

library(gsubfn)
library(jsonlite)

escape_triple_quoted2 <- function(s) {
  gsubfn('"""(.*?)"""', ~ sprintf('"%s"', gsub('"', '\\\\"', x)), s)
}

j0 |>
  escape_triple_quoted2() |>
  fromJSON()

giving

                            A                                              B
1               no quoted bits                                           <NA>
2 this contains: "quoted" bits                                 no quoted bits
3               no quoted bits this contains: "quoted" and "more quoted" bits

Upvotes: 1

Related Questions