Reputation: 87
I have a very messy JSON file (lists inside of lists) that I'm trying to convert to an R dataframe (part of the reason to convert the file is that I need to export it into a .csv file). Here is a sample of the data (https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=0). I tried this solution (Parse nested JSON to Data Frame in R), but that got rid of many of my columns. Below is the code I have so far:
library("twitteR")
library ("streamR")
library("rjson")
json_file <- "20140909-20141010_10zdfxhqf0_2014_09_09_01_00_activities.json"
json_data <- fromJSON(file=json_file) #convert to r list
str (json_data) #list of 16 objects
#unlist elements
tweets.i <- lapply(json_data, function(x){ unlist(x)})
tweets <- do.call("rbind", tweets.i)
tweets <- as.data.frame(tweets)
library(plyr)
tweets <- rbind.fill(lapply(tweets.i,
function(x) do.call("data.frame", as.list(x))
))
Anyone have a way to convert the file to an R dataframe without losing all the info? I'm open to using Python to do this work to, I just don't have the expertise to figure out how to code it.
Upvotes: 1
Views: 661
Reputation: 6579
I like the answer provided above as a really quick way to get everything. You could try tidyjson, but it also will not be efficient since it requires pre-knowledge of the structure. listviewer::jsonedit
might help visualize what you are working with.
#devtools::install_github("timelyportfolio/listviewer")
library(listviewer)
jsonedit(readLines(
"https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1"
)[2])
Perhaps a data.frame
really isn't the best structure, but it really depends on what you are trying to accomplish.
This is just a sample to hopefully show you how it might look.
library(tidyjson)
library(dplyr)
json <- readLines(
"https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1"
)
json %>%
{
Filter(
function(x){return (nchar(x) != 0)}
,.
)
} %>%
as.tbl_json() %>%
spread_values(
id = jstring("id")
,objectType = jstring("objectType")
,link = jstring("link")
,body = jstring("body")
,favoritesCount = jstring("favoritesCount")
,twitter_filter_level = jstring("twitter_filter_level")
,twitter_lang = jstring("twitter_lang")
,retweetCount = jnumber("retweetCount")
,verb = jstring("verb")
,postedTime = jstring("postedTime")
# from actor object in the JSON
,actor_objectType = jstring("actor","objectType")
,actor_id = jstring("actor","id")
,actor_link = jstring("actor","link")
,actor_displayName = jstring("actor","displayName")
,actor_image = jstring("actor","image")
,actor_summary = jstring("actor","summary")
,actor_friendsCount = jnumber("actor","friendsCount")
,actor_followersCount = jnumber("actor","followersCount")
) %>%
# careful once you enter you can't go back up
enter_object("actor","links") %>%
gather_array( ) %>%
spread_values(
actor_href = jstring("href")
)
Upvotes: 2
Reputation: 54237
This is not very efficient, but it may work for you:
download.file("https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1", destfile = tf <- tempfile(fileext = ".json"))
txt <- readLines(tf)
library(jsonlite)
library(plyr)
df <- do.call(plyr::rbind.fill, lapply(txt[txt != ""], function(x) as.data.frame(t(unlist(fromJSON(x))))))
Upvotes: 2