Reputation: 113
I'm trying to build a data frame from a series of API calls. Each call returns some JSON, like this:
{"ip":"83.108.241.206","country_code":"NO","country_name":"Norway","region_code":"15","region_name":"Sogn og Fjordane","city":"Øvre Årdal","zipcode":"6884","latitude":61.3167,"longitude":7.8,"metro_code":"","area_code":""}
I would like to compile a bunch of these calls into a data frame, with columns "ip", "country code", etc. But I'm having trouble efficiently getting each file into a form I can call rbind on.
I'm using a vector of URLs to make the API calls, like this:
> urls <- c("http://freegeoip.net/json/83.108.241.206", "http://freegeoip.net/json/129.118.15.107","http://freegeoip.net/json/189.144.59.71", "http://freegeoip.net/json/24.106.181.190", "http://freegeoip.net/json/213.226.181.3", "http://freegeoip.net/json/84.1.204.89")
> urls
[1] "http://freegeoip.net/json/83.108.241.206"
[2] "http://freegeoip.net/json/129.118.15.107"
[3] "http://freegeoip.net/json/189.144.59.71"
[4] "http://freegeoip.net/json/24.106.181.190"
[5] "http://freegeoip.net/json/213.226.181.3"
[6] "http://freegeoip.net/json/84.1.204.89"
What's the best way to get from URL to JSON to data frame?
Upvotes: 0
Views: 253
Reputation: 263411
I'm copying the "transcript" so you can see the intermediate values and a few of the errors I made. Not that difficult with a few tools:
> require(RJSONIO) # Used version 1.3-0
> require(downloader) # version 0.3
# probably not necessary but has wider range of url-types it can handle
Loading required package: downloader
> urls <- c("http://freegeoip.net/json/83.108.241.206",
"http://freegeoip.net/json/129.118.15.107",
"http://freegeoip.net/json/189.144.59.71",
"http://freegeoip.net/json/24.106.181.190",
"http://freegeoip.net/json/213.226.181.3",
"http://freegeoip.net/json/84.1.204.89")
>
> download(urls[1], "temp")
100 225 100 225 0 0 1301 0 --:--:-- --:--:-- --:--:-- 2710 0 --:--:-- --:--:-- --:--:-- 0
# Experience tells me to use `quiet=TRUE`
# to prevent bad interactions with my GUI console display
> df <- fromJSON(file("temp")) #### See below for improved strategy ###
> str(df)
List of 11
$ ip : chr "83.108.241.206"
$ country_code: chr "NO"
$ country_name: chr "Norway"
$ region_code : chr "15"
$ region_name : chr "Sogn og Fjordane"
$ city : chr "Øvre Årdal"
$ zipcode : chr "6884"
$ latitude : num 61.3
$ longitude : num 7.8
$ metro_code : chr ""
$ area_code : chr ""
> str(as.data.frame(df))
'data.frame': 1 obs. of 11 variables:
$ ip : Factor w/ 1 level "83.108.241.206": 1
$ country_code: Factor w/ 1 level "NO": 1
$ country_name: Factor w/ 1 level "Norway": 1
$ region_code : Factor w/ 1 level "15": 1
$ region_name : Factor w/ 1 level "Sogn og Fjordane": 1
$ city : Factor w/ 1 level "Øvre Årdal": 1
$ zipcode : Factor w/ 1 level "6884": 1
$ latitude : num 61.3
$ longitude : num 7.8
$ metro_code : Factor w/ 1 level "": 1
$ area_code : Factor w/ 1 level "": 1
> str(as.data.frame(df, stringsAsFactors=FALSE))
'data.frame': 1 obs. of 11 variables:
$ ip : chr "83.108.241.206"
$ country_code: chr "NO"
$ country_name: chr "Norway"
$ region_code : chr "15"
$ region_name : chr "Sogn og Fjordane"
$ city : chr "Øvre Årdal"
$ zipcode : chr "6884"
$ latitude : num 61.3
$ longitude : num 7.8
$ metro_code : chr ""
$ area_code : chr ""
So that's the preparation. If you left those columns as factors it would have messed up with the first rbind
call:
df <- as.data.frame( fromJSON(file("temp")) , stringsAsFactors=FALSE)
for ( i in 2:length(urls) ) {download(urls[i], "temp", quiet=TRUE); df <- rbind( df, fromJSON( file("temp") ) )}
> df
ip country_code country_name region_code region_name
df "83.108.241.206" "NO" "Norway" "15" "Sogn og Fjordane"
"129.118.15.107" "US" "United States" "TX" "Texas"
"189.144.59.71" "MX" "Mexico" "09" "Distrito Federal"
"24.106.181.190" "US" "United States" "NC" "North Carolina"
"213.226.181.3" "LT" "Lithuania" "57" "Kauno Apskritis"
"84.1.204.89" "HU" "Hungary" "12" "Komárom-Esztergom"
city zipcode latitude longitude metro_code area_code
df "Øvre Årdal" "6884" 61.3167 7.8 "" ""
"Lubbock" "79409" 33.61 -101.8213 "651" "806"
"Mexico" "" 19.4342 -99.1386 "" ""
"Raleigh" "27604" 35.8181 -78.5636 "560" "919"
"Kaunas" "" 54.9 23.9 "" ""
"Környe" "" 47.5467 18.3208 "" ""
Adding the coercion to dataframe-class with stringsAsFactors=FALSE
prevents the rbind() operation creating a matrix of lists or having problems with rbinding rows with factors.
Upvotes: 1