Foothill_trudger
Foothill_trudger

Reputation: 103

Extracting multiple JSON files into one dataframe

I am trying to merge multiple json files into one database and despite trying all the approaches found on SO, it fails.

The files provide sensor data. The stages I've completed are:

1. Unzip the files - produces json files saved as '.txt' files
2. Remove the old zip files
3. Parse the '.txt' files to remove some bugs in the content - random 3 
letter + comma combos at the start of some lines, e.g. 'prm,{...'

I've got code which will turn them into data frames individually:

stream <- stream_in(file("1.txt"))
flat <- flatten(stream)
df_it <- as.data.frame(flat)

But when I put it into a function:

df_loop <- function(x) {
  stream <- stream_in(x)
  flat <- flatten(stream)
  df_it <- as.data.frame(flat)
  df_it
}

And then try to run through it:

df_all <- sapply(file.list, df_loop)

I get:

Error: Argument 'con' must be a connection.

Then I've tried to merge the json files with rbind.fill and merge to no avail.

Not really sure where I'm going so terribly wrong so would appreciate any help.

Upvotes: 0

Views: 294

Answers (1)

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

You need a small change in your function. Change to -

stream <- stream_in(file(x))

Explanation

Start with analyzing your original implementation -

stream <- stream_in(file("1.txt"))

The 1.txt here is the file path which is getting passed as an input parameter to file() function. A quick ?file will tell you that it is a

Function to create, open and close connections, i.e., “generalized files”, such as possibly compressed files, URLs, pipes, etc.

Now if you do a ?stream_in() you will find that it is a

function that implements line-by-line processing of JSON data over a connection, such as a socket, url, file or pipe

Keyword here being socket, url, file or pipe.

Your file.list is just a list of file paths, character/strings to be specific. But in order for stream_in() to work, you need to pass in a file object, which is the output of file() function which takes in the file path as a string input.

Chaining that together, you needed to do stream_in(file("/path/to/file.txt")).

Once you do that, your sapply takes iterates each path, creates the file object and passes it as input to stream_in().

Hope that helps!

Upvotes: 1

Related Questions