Reputation: 35
I am very new to Json files. I scraped a txt file with some million json objects such as:
{
"created_at":"Mon Oct 14 21:04:25 +0000 2013",
"default_profile":true,
"default_profile_image":true,
"description":"...",
"followers_count":5,
"friends_count":560,
"geo_enabled":true,
"id":1961287134,
"lang":"de",
"name":"Peter Schmitz",
"profile_background_color":"C0DEED",
"profile_background_image_url":"http://abs.twimg.com/images/themes",
"utc_offset":-28800,
...
}
{
"created_at":"Fri Oct 17 20:04:25 +0000 2015",
...
}
I want to extract the columns into a data frame in R:
Variable Value
created_at X
default_profile Y
…
In general, similar to how done here(multiple Json objects in one file extract by python) in Python. If anyone has an idea or a suggestion, help would be much appreciated! Thank you!
Upvotes: 2
Views: 1239
Reputation: 25395
Here is an example on how you could approach it with two objects. I assume you were able to read the JSON from a file, otherwise see here.
myjson = '{"created_at": "Mon Oct 14 21:04:25 +0000 2013", "default_profile": true,
"default_profile_image": true, "description": "...", "followers_count":
5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":
"de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
{"created_at": "Mon Oct 15 21:04:25 +0000 2013", "default_profile": true,
"default_profile_image": true, "description": "...", "followers_count":
5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":
"de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
'
library("rjson")
# Split the text into a list of all JSON objects. I chose '!x!x!' pretty randomly.. There may be better ways of keeping the brackets wile splitting.
my_json_objects = head(strsplit(gsub('\\}','\\}!x!x!', myjson),'!x!x!')[[1]],-1)
# read the text as JSON objects
json_data <- lapply(my_json_objects, function(x) {fromJSON(x)})
# Transform to dataframes
json_data <- lapply(json_data, function(x) {data.frame(val=unlist(x))})
Output:
[[1]]
val
created_at Mon Oct 14 21:04:25 +0000 2013
default_profile TRUE
default_profile_image TRUE
description ...
followers_count 5
friends_count 560
geo_enabled TRUE
id 1961287134
lang de
name Peter Schmitz
profile_background_color C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset -28800
[[2]]
val
created_at Mon Oct 15 21:04:25 +0000 2013
default_profile TRUE
default_profile_image TRUE
description ...
followers_count 5
friends_count 560
geo_enabled TRUE
id 1961287134
lang de
name Peter Schmitz
profile_background_color C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset -28800
Hope this helps!
Upvotes: 2