Reputation: 23670
Where quotes withing JSON strings are not escaped but nested in triple-quotes e.g.
j0 = '[
{
"A" : "no quoted bits"
},
{
"A" : """this contains: "quoted" bits""",
"B" : "no quoted bits"
},
{
"A" : "no quoted bits",
"B" : """this contains: "quoted" and "more quoted" bits"""
}
]'
reading into R will error e.g.
jsonlite::fromJSON(j0)
#> Error: parse error: after key and value, inside map, I expect ',' or '}'
#> bits" }, { "A" : """this contains: "quoted" bits"""
#> (right here) ------^
I've cobbled together a hacky workaround
escape_triple_quoted = function(j){
j_split = strsplit(j, '"{3}')[[1]]
f = seq_along(j_split) %% 2 == 0 # filter
j_split[f] = gsub('"', '\\\\"', j_split[f])
paste(j_split, collapse = '"')
}
escape_triple_quoted(j0) |> jsonlite::fromJSON()
#> A B
#> 1 no quoted bits <NA>
#> 2 this contains: "quoted" bits no quoted bits
#> 3 no quoted bits this contains: "quoted" and "more quoted" bits
# function for parsing strings where quotes are not escaped but nested inside triple-quotes
but it doesn't feel like best practice. Is there a better approach?
Upvotes: 0
Views: 275
Reputation: 269852
Here is a one-liner for escape_triple_quotes
using gsubfn
. The gsubfn
function is like gsub
except the second argument may be a function which inputs the capture groups of the match and outputs the replacement of the match. It may be expressed in formula notation as we do here.
library(gsubfn)
library(jsonlite)
escape_triple_quoted2 <- function(s) {
gsubfn('"""(.*?)"""', ~ sprintf('"%s"', gsub('"', '\\\\"', x)), s)
}
j0 |>
escape_triple_quoted2() |>
fromJSON()
giving
A B
1 no quoted bits <NA>
2 this contains: "quoted" bits no quoted bits
3 no quoted bits this contains: "quoted" and "more quoted" bits
Upvotes: 1