OctoCatKnows
OctoCatKnows

Reputation: 419

Why can I only read one .json file at a time?

I have 500+ .json files that I am trying to get a specific element out of. I cannot figure out why I cannot read more than one at a time..

This works:

library (jsonlite)
files<-list.files(‘~/JSON’)
file1<-fromJSON(readLines(‘~/JSON/file1.json),flatten=TRUE)
result<-as.data.frame(source=file1$element$subdata$data)

However, regardless of using different json packages (eg RJSONIO), I cannot apply this to the entire contents of files. The error I continue to get is...

attempt to run same code as function over all contents in file list

for (i in files) { fromJSON(readLines(i),flatten = TRUE) as.data.frame(i)$element$subdata$data}

My goal is to loop through all 500+ and extract the data and its contents. Specifically if the file has the element ‘subdata$data’, i want to extract the list and put them all in a dataframe.

Note: files are being read as ASCII (Windows OS). This does bot have a negative effect on single extractions but for the loop i get ‘invalid character bytes’

Update 1/25/2019

Ran the following but returned errors...

files<-list.files('~/JSON')
out<-lapply(files,function (fn) {
o<-fromJSON(file(i),flatten=TRUE)
as.data.frame(i)$element$subdata$data
})

Error in file(i): object 'i' not found

Also updated function, this time with UTF* errors...

    files<-list.files('~/JSON')
out<-lapply(files,function (i,fn) {
o<-fromJSON(file(i),flatten=TRUE)
as.data.frame(i)$element$subdata$data
})

Error in parse_con(txt,bigint_as_char):
 lexical error: invalid bytes in UTF8 string. (right here)------^

Latest Update Think I found out a solution to the crazy 'bytes' problem. When I run readLines on the .json file, I can then apply fromJSON),

e.x.

json<-readLines('~/JSON')
jsonread<-fromJSON(json)
jsondf<-as.data.frame(jsonread$element$subdata$data)
#returns a dataframe with the correct information

Problem is, I cannot apply readLines to all the files within the JSON folder (PATH). If I can get help with that, I think I can run...

files<-list.files('~/JSON')
for (i in files){
a<-readLines(i)
o<-fromJSON(file(a),flatten=TRUE)
as.data.frame(i)$element$subdata}

Needed Steps

apply readLines to all 500 .json files in JSON folder apply fromJSON to files from step.1 create a data.frame that returns entries if list (fromJSON) contains $element$subdata$data.

Thoughts?

Solution (Workaround?)

Unfortunately, the fromJSON still runs in to trouble with the .json files. My guess is that my GET method (httr) is unable to wait/delay and load the 'pretty print' and thus is grabbing the raw .json which in-turn is giving odd characters and as a result giving the ubiquitous '------^' error. Nevertheless, I was able to put together a solution, please see below. I want to post it for future folks that may have the same problem with the .json files not working nicely with any R json package.

#keeping the same 'files' variable as earlier
raw_data<-lapply(files,readLines)
dat<-do.call(rbind,raw_data)
dat2<-as.data.frame(dat,stringsasFactors=FALSE)
#check to see json contents were read-in
dat2[1,1]

library(tidyr)
dat3<-separate_rows(dat2,sep='')
x<-unlist(raw_data)
x<-gsub('[[:punct:]]', ' ',x)

#Identify elements wanted in original .json and apply regex
y<-regmatches(x,regexc('.*SubElement2 *(.*?) *Text.*',x))

Upvotes: 0

Views: 276

Answers (1)

r2evans
r2evans

Reputation: 160742

  1. for loops never return anything, so you must save all valuable data yourself.
  2. You call as.data.frame(i) which is creating a frame with exactly one element, the filename, probably not what you want to keep.
  3. (Minor) Use fromJSON(file(i),...).
  4. Since you want to capture these into one frame, I suggest something along the lines of:

    out <- lapply(files, function(fn) {
      o <- fromJSON(file(fn), flatten = TRUE)
      as.data.frame(o)$element$subdata$data
    })
    allout <- do.call(rbind.data.frame, out)
    ### alternatives:
    allout <- dplyr::bind_rows(out)
    allout <- data.table::rbindlist(out)
    

Upvotes: 2

Related Questions