Jibril
Jibril

Reputation: 1037

Escape Characters in JSON Data

I start with

   mentions = GET(final_url, sig)
   json = content(mentions)

My code crashes at the next line which is

   json2 = jsonlite::fromJSON(toJSON(json))

Giving error...

    Error: lexical error: invalid character inside string.
      Foundation and 42nd President of the United States. Follow 
                 (right here) ------^

I'm dealing with some JSON data. One small piece of it looks like this. That is, this is the output from my variable "json".

    Lots of JSON before this....

    $statuses[[99]]$retweeted_status$user$location
    [1] "New York, NY"

    $statuses[[99]]$retweeted_status$user$description
    [1] "Founder, Clinton Foundation and 42nd President \003of the United                States. Follow @clintonfdn for \003more on my work around the world."

    $statuses[[99]]$retweeted_status$user$url
    [1] "http://t.co/gI8TIlAJHk"

As you can see, there is escape character \003 embedded in one of the pieces of JSON data.

I'm dealing with hundreds of pieces of good information in the same file, but this could happen anywhere thinking of it now. Sure this time it happened in "description" but it could happen in a tweet, in a location, in a description, etc.

Is there a way to "clean" escape characters from JSON before trying to do jsonlite::fromJSON(toJSON()) to avoid my code crashing here?

Upvotes: 2

Views: 2188

Answers (1)

IRTFM
IRTFM

Reputation: 263411

You could try somthing like this:

 json2 <- gsub("[\001-\026]*", "", json)

Here's a simple "test of strategy"

> gsub("[\003-\005]*", '', "\003\004\005\027abc")
[1] "\027abc"

If you need a better test, you should post output of dput(head(json)).

Upvotes: 2

Related Questions