Reputation: 103
I am trying to merge multiple json files into one database and despite trying all the approaches found on SO, it fails.
The files provide sensor data. The stages I've completed are:
1. Unzip the files - produces json files saved as '.txt' files
2. Remove the old zip files
3. Parse the '.txt' files to remove some bugs in the content - random 3
letter + comma combos at the start of some lines, e.g. 'prm,{...'
I've got code which will turn them into data frames individually:
stream <- stream_in(file("1.txt"))
flat <- flatten(stream)
df_it <- as.data.frame(flat)
But when I put it into a function:
df_loop <- function(x) {
stream <- stream_in(x)
flat <- flatten(stream)
df_it <- as.data.frame(flat)
df_it
}
And then try to run through it:
df_all <- sapply(file.list, df_loop)
I get:
Error: Argument 'con' must be a connection.
Then I've tried to merge the json files with rbind.fill and merge to no avail.
Not really sure where I'm going so terribly wrong so would appreciate any help.
Upvotes: 0
Views: 294
Reputation: 9081
You need a small change in your function. Change to -
stream <- stream_in(file(x))
Explanation
Start with analyzing your original implementation -
stream <- stream_in(file("1.txt"))
The 1.txt
here is the file path which is getting passed as an input parameter to file()
function. A quick ?file
will tell you that it is a
Function to create, open and close connections, i.e., “generalized files”, such as possibly compressed files, URLs, pipes, etc.
Now if you do a ?stream_in()
you will find that it is a
function that implements line-by-line processing of JSON data over a connection, such as a socket, url, file or pipe
Keyword here being socket, url, file or pipe
.
Your file.list
is just a list of file paths, character/strings to be specific. But in order for stream_in()
to work, you need to pass in a file
object, which is the output of file()
function which takes in the file path as a string input.
Chaining that together, you needed to do stream_in(file("/path/to/file.txt"))
.
Once you do that, your sapply
takes iterates each path, creates the file object and passes it as input to stream_in()
.
Hope that helps!
Upvotes: 1