Reputation: 13
I am pretty new with R. Trying to solve one problem already the entire day. Unfortunately I couldn´t solve it.
I´d like to import a JSON file in R and then have the opportunity to further process it in the same way as when I am importing a CSV file.
My JSON file has to following structure:
{ "reviewerID": "A2SUAM1J3GNN3B",
"asin": "0000013714",
"reviewerName": "J. McDonald",
"helpful": [2, 3],
"reviewText": "I bought this for my husband who plays the piano.
He is having a wonderful time playing these old hymns. The music is at
times hard to read because we think the book was published for singing
from more than playing from. Great purchase though!",
"overall": 5.0,
"summary": "Heavenly Highway Hymns",
"unixReviewTime": 1252800000,
"reviewTime": "09 13, 2009"
}
I´d like to import the JSON file and then have a table that consist of 9 columns (reviewerID, asin, reviewerName, etc.).
I tried it with the R package jsonlite, but if I do so I get the following error message:
data <- fromJSON('reviews_Office_Products.json.gz2')
Error in feed_push_parser(buf) : parse error: trailing garbage
"reviewTime": "07 19, 2013"} {"reviewerID": "A3BBNK2R5TUYGV"
(right here) ------^
Do you have any idea who I can accomplish my undertaking?
Thank you very much in advance.
Best regards Paul
Upvotes: 0
Views: 1203
Reputation: 13
finally I did it as follows:
library(rjson)
url <- "reviews_Office_Products.json.gz2"
con = file(url, "r")
input <- readLines(con, -1L)
my_results <- lapply(X=input,fromJSON)
close(con)
tr.review <- ldply(lapply(input, function(x) t(unlist(fromJSON(x)))))
save(tr.review, file= 'tr.review.rdata')
For my purposes this works and I can further process the data with the tm-package.
Thank you very much for your help. Paul
Upvotes: 1
Reputation: 4024
This works. You might need to play around with the regular expression to make it fit. Note that double instead of single backslashes are needed in R regexes.
library(rjson)
library(magrittr)
library(dplyr)
library(lubridate)
library(stringi)
options(stringsAsFactors = FALSE)
'{ "reviews": [ { "reviewerID": "A2SUAM1J3GNN3B",
"asin": "0000013714",
"reviewerName": "J. McDonald",
"helpful": [2, 3],
"reviewText": "I bought this for my husband who plays the piano.
He is having a wonderful time playing these old hymns. The music is at
times hard to read because we think the book was published for singing
from more than playing from. Great purchase though!",
"overall": 5.0,
"summary": "Heavenly Highway Hymns",
"unixReviewTime": 1252800000,
"reviewTime": "09 13, 2009"
} { "reviewerID": "A2SUAM1J3GNN3B",
"asin": "0000013714",
"reviewerName": "J. McDonald",
"helpful": [2, 3],
"reviewText": "I bought this for my husband who plays the piano.
He is having a wonderful time playing these old hymns. The music is at
times hard to read because we think the book was published for singing
from more than playing from. Great purchase though!",
"overall": 5.0,
"summary": "Heavenly Highway Hymns",
"unixReviewTime": 1252800000,
"reviewTime": "09 13, 2009"
} ] }' %>%
writeLines("reviews_Office_Products.json.gz2")
data =
"reviews_Office_Products.json.gz2" %>%
readLines %>%
stri_replace_all_regex("\\}[ \\n]*\\{", "},{") %>%
paste(collapse = "\n") %>%
fromJSON %>%
.[[1]] %>%
lapply(as.data.frame) %>%
bind_rows %>%
select(-unixReviewTime) %>%
mutate(asin = as.numeric(asin),
reviewTime = mdy(reviewTime) )
review =
data %>%
select(-helpful) %>%
distinct
review__helpful =
data %>%
select(reviewerID, helpful) %>%
distinct
Upvotes: 0