Nidhi Agarwal
Nidhi Agarwal

Reputation: 458

Error in parsing a json file in R

Yelp business data with 100 instances, in the following format:

{ 
    "_id" : ObjectId("5aab338ffc08b46adb7a2320"), 
    "business_id" : "Pd52CjgyEU3Rb8co6QfTPw", 
    "name" : "Flight Deck Bar & Grill", 
    "neighborhood" : "Southeast", 
    "address" : "6730 S Las Vegas Blvd", 
    "city" : "Las Vegas", 
    "state" : "NV", 
    "postal_code" : "89119", 
    "latitude" : 36.0669136, 
    "longitude" : -115.1708484, 
    "stars" : 4.0, 
    "review_count" : NumberInt(13), 
    "is_open" : NumberInt(1), 
    "attributes" : {
        "Alcohol" : "full_bar", 
        "HasTV" : true, 
        "NoiseLevel" : "average", 
        "RestaurantsAttire" : "casual", 
        "BusinessAcceptsCreditCards" : true, 
        "Music" : {
            "dj" : false, 
            "background_music" : true, 
            "no_music" : false, 
            "karaoke" : false, 
            "live" : false, 
            "video" : false, 
            "jukebox" : false
        }, 
        "Ambience" : {
            "romantic" : false, 
            "intimate" : false, 
            "classy" : false, 
            "hipster" : false, 
            "divey" : false, 
            "touristy" : false, 
            "trendy" : false, 
            "upscale" : false, 
            "casual" : true
        }, 
        "RestaurantsGoodForGroups" : true, 
        "Caters" : true, 
        "WiFi" : "free", 
        "RestaurantsReservations" : false, 
        "RestaurantsTableService" : true, 
        "RestaurantsTakeOut" : true, 
        "GoodForKids" : true, 
        "HappyHour" : true, 
        "GoodForDancing" : false, 
        "BikeParking" : true, 
        "OutdoorSeating" : false, 
        "RestaurantsPriceRange2" : NumberInt(2), 
        "RestaurantsDelivery" : false, 
        "BestNights" : {
            "monday" : false, 
            "tuesday" : false, 
            "friday" : false, 
            "wednesday" : true, 
            "thursday" : false, 
            "sunday" : false, 
            "saturday" : false
        }, 
        "GoodForMeal" : {
            "dessert" : false, 
            "latenight" : false, 
            "lunch" : true, 
            "dinner" : false, 
            "breakfast" : false, 
            "brunch" : false
        }, 
        "BusinessParking" : {
            "garage" : false, 
            "street" : false, 
            "validated" : false, 
            "lot" : true, 
            "valet" : false
        }, 
        "CoatCheck" : false, 
        "Smoking" : "no", 
        "WheelchairAccessible" : true
    }, 
    "categories" : [
        "Nightlife", 
        "Bars", 
        "Barbeque", 
        "Sports Bars", 
        "American (New)", 
        "Restaurants"
    ], 
    "hours" : {
        "Monday" : "8:30-22:30", 
        "Tuesday" : "8:30-22:30", 
        "Friday" : "8:30-22:30", 
        "Wednesday" : "8:30-22:30", 
        "Thursday" : "8:30-22:30", 
        "Sunday" : "8:30-22:30", 
        "Saturday" : "8:30-22:30"
    }
}

I need to import this in R. I have the following code:

library('jsonlite')
data<- stream_in(file("~/Desktop/business100.json"))

When i use the above code,It gives the following error:

Error: lexical error: invalid char in json text.
                         {     "_id" : ObjectId("5aab338ffc08b46adb7a2
                     (right here) ------^

I think there is some problem with the format of the json, but when i see the json file in mongodb, it looks fine. What can be done for it, thank you!

Upvotes: 0

Views: 152

Answers (1)

r2evans
r2evans

Reputation: 160447

If this is mongolite (as suggested in the comments), that is likely the best way to go. If you are stuck and cannot use it for some reason, it is possible to replace these non-JSON properties and parse it with regular JSON parsers.

To generalize, create a vector of the (verbatim) strings. I make the assumption that each property is of the form DiscardableProperty(save_all_here), so a good starting point based on the data you've provided is:

ptns <- c('ObjectId', 'NumberInt')
str(jsontxt)
#  chr "{ \n    \"_id\" : ObjectId(\"5aab338ffc08b46adb7a2320\"), \n    \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n    \"name\" : "| __truncated__
jsontxt2 <- Reduce(function(txt, p) gsub(sprintf("%s\\(([^)]+)\\)", p), "\\1", txt),
                   ptns, init=jsontxt)
str(jsontxt2)
#  chr "{ \n    \"_id\" : \"5aab338ffc08b46adb7a2320\", \n    \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n    \"name\" : \"Flight D"| __truncated__

(Notice the absence of ObjectId.)

This parses just fine:

str(fromJSON(jsontxt2))
# List of 16
#  $ _id         : chr "5aab338ffc08b46adb7a2320"
#  $ business_id : chr "Pd52CjgyEU3Rb8co6QfTPw"
#  $ name        : chr "Flight Deck Bar & Grill"
#  $ neighborhood: chr "Southeast"
#  $ address     : chr "6730 S Las Vegas Blvd"
# ...

Edit: single-pass replacement with:

jsontxt2 <- gsub(sprintf("(%s)\\(([^)]+)\\)", paste(ptns, collapse = "|")),
                 "\\2", jsontxt)

Upvotes: 1

Related Questions