Reputation:
Why do i keep getting an error when i try to create a table in R and then export that file to excel. i am getting an error:
Error in write.table(data1, "data1.csv", col.names = NA, sep = ",", dec = ".", : unimplemented type 'list' in 'EncodeElement'
I am aware i will need to flatten the list, but i am not sure how i would flatten it.
My Code:
library(httr)
library(jsonlite)
library(xml2)
library("rio")
library("magrittr")
query <- "http://api.erg.kcl.ac.uk/AirQuality/Information/MonitoringSiteSpecies/GroupName=London/Json"
out <- GET(url=query)
http_status(out)
data1 <- content(out)
data1 <- fromJSON( "http://api.erg.kcl.ac.uk/AirQuality/Information/MonitoringSiteSpecies/GroupName=London/Json")
data1 <- as.data.frame(data1)
write.table(data1, file="data1.csv", row.names=FALSE, na="", col.names = FALSE, sep=",")
View(data1)
setwd("F:/")
write.csv(data1,'data1.csv')
Upvotes: 3
Views: 6982
Reputation: 2920
Converting the content of the httr::GET()
request from json into an r object required a few data manipulation steps prior to you being able to export as a csv file.
httr::GET()
returns multiple objects# install necessary packages
install.packages( pkgs = c( "httr", "jsonlite", "magrittr" ))
# load necessary packages
library( httr )
library( jsonlite )
library( magrittr )
# store query
query <- "http://api.erg.kcl.ac.uk/AirQuality/Information/MonitoringSiteSpecies/GroupName=London/Json"
# GET the query
out <- httr::GET( url = query )
# base method
# Convert content from raw bytes to character
contents.out.base <- base::rawToChar( x = out$content )
# examine the first 30 characters
# from the contents in JSON form
base::substr( x = contents.out.base
, start = 0
, stop = 30
)
# [1] "{\"Sites\":{\"Site\":[{\"@LocalAuth"
# transfrom from JSON string
# into a data frame
# set 'flatten' equal to TRUE
# to break out lists into individual columns
contents.out.base.df <-
jsonlite::fromJSON( txt = contents.out.base
, flatten = TRUE
)
# view the data
class( contents.out.base.df ) # [1] "list"
# Interesting! It didn't return a data frame
names( contents.out.base.df ) # [1] "Sites"
names( contents.out.base.df$Sites ) # [1] "Site"
names( contents.out.base.df$Sites$Site )
# [1] "@LocalAuthorityCode" "@LocalAuthorityName" "@SiteCode"
# [4] "@SiteName" "@SiteType" "@DateClosed"
# [7] "@DateOpened" "@Latitude" "@Longitude"
# [10] "@LatitudeWGS84" "@LongitudeWGS84" "@DataOwner"
# [13] "@DataManager" "@SiteLink" "Species"
# Note that 'Species' doesn't contain an '@' in front of it
# Why?
lapply( X = contents.out.base.df$Sites$Site, FUN = class )
# $`@LocalAuthorityCode`
# [1] "character"
#
# $`@LocalAuthorityName`
# [1] "character"
#
# $`@SiteCode`
# [1] "character"
#
# $`@SiteName`
# [1] "character"
#
# $`@SiteType`
# [1] "character"
#
# $`@DateClosed`
# [1] "character"
#
# $`@DateOpened`
# [1] "character"
#
# $`@Latitude`
# [1] "character"
#
# $`@Longitude`
# [1] "character"
#
# $`@LatitudeWGS84`
# [1] "character"
#
# $`@LongitudeWGS84`
# [1] "character"
#
# $`@DataOwner`
# [1] "character"
#
# $`@DataManager`
# [1] "character"
#
# $`@SiteLink`
# [1] "character"
#
# $Species
# [1] "list"
# save contents.out.base.df$Sites$Site as its own data frame
# without $Species
website.df <-
contents.out.base.df$Sites$Site[
, which( colnames( contents.out.base.df$Sites$Site ) != "Species" )
]
# check dim
dim( website.df ) # [1] 212 14
# view the first six rows
head( x = website.df )
#' @LocalAuthorityCode @LocalAuthorityName @SiteCode
#' 1 1 Barking and Dagenham BG3
#' 2 1 Barking and Dagenham BG1
#' 3 1 Barking and Dagenham BG2
#' 4 2 Barnet BN2
#' 5 2 Barnet BN3
#' 6 2 Barnet BN1
#' @SiteName @SiteType
#' 1 Barking and Dagenham - North Street Kerbside
#' 2 Barking and Dagenham - Rush Green Suburban
#' 3 Barking and Dagenham - Scrattons Farm Suburban
#' 4 Barnet - Finchley Urban Background
#' 5 Barnet - Strawberry Vale Urban Background
#' 6 Barnet - Tally Ho Corner Kerbside
#' @DateClosed @DateOpened @Latitude
#' 1 2011-05-25 00:00:00 2007-03-16 00:00:00 51.540444
#' 2 1999-11-02 00:00:00 51.563752
#' 3 1999-10-17 00:00:00 51.529389
#' 4 2012-04-20 00:00:00 2000-08-09 13:00:00 51.591901
#' 5 2002-05-15 00:00:00 2000-08-14 14:00:00 51.6008848453589
#' 6 2012-04-20 00:00:00 1998-12-20 12:00:00 51.614675
#' @Longitude @LatitudeWGS84 @LongitudeWGS84
#' 1 0.074418 6717454.5833 8284.17386585
#' 2 0.177891 6721627.34498 19802.7355367
#' 3 0.132857 6715476.18683 14789.5735883
#' 4 -0.205992 6726669.62886 -22930.9245475
#' 5 -0.172297542087178 6728279.54795 -19180.0746501
#' 6 -0.176607 6730751.38494 -19659.8013105
#' @DataOwner @DataManager
#' 1 Barking and Dagenham King's College London
#' 2 Barking and Dagenham King's College London
#' 3 Barking and Dagenham King's College London
#' 4 Barnet King's College London
#' 5 Barnet King's College London
#' 6 Barnet King's College London
#' @SiteLink
#' 1 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG3
#' 2 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG1
#' 3 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG2
#' 4 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BN2
#' 5 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BN3
#' 6 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BN1
website.df
to be reshaped into a Long Data FrameTL;DR the final data frame will have more than 212 rows.
One local authority, per an individual site, is able to identify more than one species.
This one-to-many relationship results in the final version of website.df
to be reshaped into 'long' format, where one local authority's information may repeat due to them identifying more than one type of species at a site.
To combine each object within contents.out.base.df$Site$Sites$Species
with its corresponding row in website.df
, I used a counter
object. Prior to the use of lapply()
, counter
is set to a value of zero.
The use of <<-
- the super assignment operator - allowed me to extract the corresponding website.df
row each time a new object within contents.out.base.df$Site$Sites$Species
was called. Reading Using a counter inside an apply structured loop in R was very helpful in learning how to do this properly.
Note: Using cbind()
to combine the objects within contents.out.base.df$Site$Sites$Species
results in several warnings()
. The SO post cbind warnings : row names were found from a short variable and have been discarded
reveals that cbind()
resulted in repeating row names. To prevent duplicate row names, it disregarded them.
# create counter
# and set its value to zero
counter <- 0
# construct the column binding
# and replace the objects within the list
# with the concated version of
# that particular object's row in website.df
contents.out.base.df$Sites$Site$Species <-
lapply( X = contents.out.base.df$Sites$Site$Species
, FUN = function( i ){
# add to counter
counter <<- counter + 1
# add columns from
# the counter row in website.df
# onto the i object in X
cbind(
website.df[ counter , ]
, i
, stringsAsFactors = FALSE
)
} # end of anonymous function
)
# There were 50 or more warnings (use warnings() to see the first 50)
warnings()
# Warning messages:
# 1: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
After collapsing the objects in contents.out.base.df$Site$Sites$Species
into one data frame, I cleaned up the row and column names of website.df
. Finally, website.df
is ready to be exported into your working directory using the write.csv()
function.
# collapse the individual objects
# in the list into one data frame
website.df <-
data.frame(
do.call( what = rbind
, args = contents.out.base.df$Sites$Site$Species
)
, stringsAsFactors = FALSE
)
# check dim
dim( website.df ) # [1] 524 18
# rename the rows
rownames( x = website.df ) <-
as.character( x = 1:nrow( x = website.df ) )
# Make syntactically valid column names
colnames( x = website.df ) <-
base::gsub( pattern = "X."
, replacement = ""
, x = colnames( website.df )
)
# view the first six rows
head( x = website.df )
# LocalAuthorityCode LocalAuthorityName SiteCode
# 1 1 Barking and Dagenham BG3
# 2 1 Barking and Dagenham BG1
# 3 1 Barking and Dagenham BG1
# 4 1 Barking and Dagenham BG2
# 5 1 Barking and Dagenham BG2
# 6 2 Barnet BN2
# SiteName SiteType
# 1 Barking and Dagenham - North Street Kerbside
# 2 Barking and Dagenham - Rush Green Suburban
# 3 Barking and Dagenham - Rush Green Suburban
# 4 Barking and Dagenham - Scrattons Farm Suburban
# 5 Barking and Dagenham - Scrattons Farm Suburban
# 6 Barnet - Finchley Urban Background
# DateClosed DateOpened Latitude Longitude
# 1 2011-05-25 00:00:00 2007-03-16 00:00:00 51.540444 0.074418
# 2 1999-11-02 00:00:00 51.563752 0.177891
# 3 1999-11-02 00:00:00 51.563752 0.177891
# 4 1999-10-17 00:00:00 51.529389 0.132857
# 5 1999-10-17 00:00:00 51.529389 0.132857
# 6 2012-04-20 00:00:00 2000-08-09 13:00:00 51.591901 -0.205992
# LatitudeWGS84 LongitudeWGS84 DataOwner
# 1 6717454.5833 8284.17386585 Barking and Dagenham
# 2 6721627.34498 19802.7355367 Barking and Dagenham
# 3 6721627.34498 19802.7355367 Barking and Dagenham
# 4 6715476.18683 14789.5735883 Barking and Dagenham
# 5 6715476.18683 14789.5735883 Barking and Dagenham
# 6 6726669.62886 -22930.9245475 Barnet
# DataManager
# 1 King's College London
# 2 King's College London
# 3 King's College London
# 4 King's College London
# 5 King's College London
# 6 King's College London
# SiteLink
# 1 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG3
# 2 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG1
# 3 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG1
# 4 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG2
# 5 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG2
# 6 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BN2
# SpeciesCode SpeciesDescription DateMeasurementStarted
# 1 NO2 Nitrogen Dioxide 2008-01-01 00:00:00
# 2 NO2 Nitrogen Dioxide 2008-01-01 00:00:00
# 3 SO2 Sulphur Dioxide 1999-10-23 00:00:00
# 4 NO2 Nitrogen Dioxide 2007-11-21 00:00:00
# 5 PM10 PM10 Particulate 1999-10-17 00:00:00
# 6 NO2 Nitrogen Dioxide 2008-01-01 00:00:00
# DateMeasurementFinished
# 1 2011-05-25 00:00:00
# 2
# 3
# 4
# 5
# 6 2012-04-20 00:00:00
# Export as CSV
write.csv( x = website.df
, file = "web_scrape.csv"
, row.names = FALSE
)
# end of script #
Using sessionInfo()
.
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] magrittr_1.5 jsonlite_1.5 httr_1.3.1
loaded via a namespace (and not attached):
[1] compiler_3.4.3 R6_2.2.2 rgdal_1.2-16 tools_3.4.3
[5] sp_1.2-7 curl_3.1 yaml_2.1.16 grid_3.4.3
[9] lattice_0.20-35
Upvotes: 1
Reputation: 5017
Try this:
data1<-data.frame(lapply(data1, as.character), stringsAsFactors=FALSE)
write.table(data1, file="data1.csv", row.names=FALSE, na="", col.names = FALSE, sep=",")
Upvotes: 4