Reputation: 33
I’m using the R programming language (and R Studio) having trouble organizing some data that I’m pulling via API so that it’s writeable to a table. I’m using the StubHub API to get a JSON response that contains all ticket listings for a particular event. I can successfully make the call to StubHub, I get the successful response. Here’s the code I am using to grab the response:
# get the content part of the response
msgContent = content(response)
# format to JSON object
jsonContent = jsonlite::fromJSON(toJSON(msgContent),flatten=TRUE,simplifyVector=TRUE)
This JSON object has a node called “listing” and that’s what I’m most interested in, so I set a variable to that part of the object:
friListings = jsonContent $listing
Checking the class of “friListings” I see I have a data.frame:
> class(friListings)
[1] "data.frame"
When I click on this variable in R Studio — View(friListings) — it opens in a new tab and looks pretty and nicely formatted. There are 21 variables (columns) and 609 observations (row). I see null values for certain cells, which is expected.
I would like to write this data.frame out as a table in a file on my computer. When I try to do that, I get this error.
> write.table(friListings,file="data",row.names=FALSE)
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) :
missing value where TRUE/FALSE needed
Looking at other postings, it appears this is happening because my data.frame is actually not “flat” and is a list of lists with different classes and nesting. I validate this by str() on each of the columns in friListings….
> str(friListings[1])
'data.frame': 609 obs. of 1 variable:
$ listingId:List of 609
..$ : int 1138579989
..$ : int 1138969061
..$ : int 1138958138
(this is just the first couple of lines, there are hundreds)
Another example:
> str(friListings[6])
'data.frame': 609 obs. of 1 variable:
$ sellerSectionName:List of 609
..$ : chr "Upper 354 - No View"
..$ : chr "Club 303 - Obstructed/No View"
..$ : chr "Middle 254 - Obstructed/No View"
(this is just the first couple of lines, there are hundreds)
Here is the head of friListings that I am attempting to share using dput from the reproducible example post:
> dput(head(friListings,4))
structure(list(listingId = list(1138579989L, 1138969061L, 1138958138L,
1139003985L), sectionId = list(1552295L, 1552172L, 1552220L,
1552289L), row = list("16", "6", "22", "26"), quantity = list(
1L, 2L, 4L, 1L), sellerSectionName = list("Upper 354 - No View",
"Club 303 - Obstructed/No View", "Middle 254 - Obstructed/No View",
"353"), sectionName = list("Upper 354 - Obstructed/No View",
"Club 303 - Obstructed/No View", "Middle 254 - Obstructed/No View",
"Upper 353 - Obstructed/No View"), seatNumbers = list("21",
"7,8", "13,14,15,16", "General Admission"), zoneId = list(
232917L, 232909L, 232914L, 232917L), zoneName = list("Upper",
"Club", "Middle", "Upper"), listingAttributeList = list(structure(c(204L,
201L), .Dim = c(2L, 1L)), structure(c(4369L, 5370L), .Dim = c(2L,
1L)), structure(c(4369L, 5989L), .Dim = c(2L, 1L)), structure(c(204L,
4369L), .Dim = c(2L, 1L))), listingAttributeCategoryList = list(
structure(1L, .Dim = c(1L, 1L)), structure(1L, .Dim = c(1L,
1L)), structure(1L, .Dim = c(1L, 1L)), structure(1L, .Dim = c(1L,
1L))), deliveryTypeList = list(structure(5L, .Dim = c(1L,
1L)), structure(5L, .Dim = c(1L, 1L)), structure(5L, .Dim = c(1L,
1L)), structure(5L, .Dim = c(1L, 1L))), dirtyTicketInd = list(
FALSE, FALSE, FALSE, FALSE), splitOption = list("0", "0",
"1", "1"), ticketSplit = list("1", "2", "2", "1"), splitVector = list(
structure(1L, .Dim = c(1L, 1L)), structure(2L, .Dim = c(1L,
1L)), structure(c(2L, 4L), .Dim = c(2L, 1L)), structure(1L, .Dim = c(1L,
1L))), sellerOwnInd = list(0L, 0L, 0L, 0L), currentPrice.amount = list(
468.99, 475L, 475L, 550.45), currentPrice.currency = list(
"USD", "USD", "USD", "USD"), faceValue.amount = list(NULL,
NULL, NULL, NULL), faceValue.currency = list(NULL, NULL,
NULL, NULL)), .Names = c("listingId", "sectionId", "row",
"quantity", "sellerSectionName", "sectionName", "seatNumbers",
"zoneId", "zoneName", "listingAttributeList", "listingAttributeCategoryList",
"deliveryTypeList", "dirtyTicketInd", "splitOption", "ticketSplit",
"splitVector", "sellerOwnInd", "currentPrice.amount", "currentPrice.currency",
"faceValue.amount", "faceValue.currency"), row.names = c(NA,
4L), class = "data.frame")
I tried to get around this by going through each column in friListings, unlisting that node, saving to a vector and then doing a cbind to stitch them all together. But, when I do that, I get vectors of different lengths because of the nulls. I took this approach one step further and tried to class each column to force NAs to preserve the nulls, but that’s not working. And, regardless, there’s gotta be a better approach than this. Here's some output to illustrate what happens when I attempt this approach.
# Take the column zoneId and casting it as numeric to force NA
friListings$zoneId<-lapply(friListings$zoneId, as.numeric)
# check the length
> length(friListings$zoneId)
[1] 609
# unlist and check the length... and I lost 11 items
> zoneid <- unlist(friListings$zoneId, use.names=FALSE)
> length(zoneid)
[1] 598
# here's the tail of the column... (because I happen to know that's where the empty values that are being dropped are)
> tail(friListings$zoneId)
[[1]]
numeric(0)
[[2]]
numeric(0)
[[3]]
numeric(0)
[[4]]
numeric(0)
[[5]]
numeric(0)
[[6]]
numeric(0)
I know people work with JSON and R all the time (I'm obviously not one of those people!), so maybe I’m missing something obvious. But I’ve spent 5 hours trying different ways to clean this data and searching the internet for answers. I read the JSON package documentation, too.
I really just want to "flatten" this object so that it’s pretty and structured in the same way the R Studio renders it when I do View(friListings). I'm already passing "flatten=TRUE" in my "fromJSON" call above and it doesn't seem to be doing what I expect. Same with the "simplifyVector=TRUE" (which is TRUE by default according to the docs, but added it for clarity).
Thanks for any insight or guidance you may be able to offer!!!
Upvotes: 2
Views: 2504
Reputation: 54247
You might want to try and adapt this approach:
f <- function(x)
if(is.list(x)) {
unlist(lapply(x, f))
} else {
x[which(is.null(x))] <- NA
paste(x, collapse = ",")
}
df <- as.data.frame(do.call(cbind, lapply(friListings, f)))
write.table(df, tf <- tempfile(fileext = "csv"))
df <- read.table(tf)
str(df)
# 'data.frame': 4 obs. of 21 variables:
# $ listingId : int 1138579989 1138969061 1138958138 1139003985
# $ sectionId : int 1552295 1552172 1552220 1552289
# $ row : int 16 6 22 26
# $ quantity : int 1 2 4 1
# $ sellerSectionName : Factor w/ 4 levels "353","Club 303 - Obstructed/No View",..: 4 2 3 1
# $ sectionName : Factor w/ 4 levels "Club 303 - Obstructed/No View",..: 4 1 2 3
# $ seatNumbers : Factor w/ 4 levels "13,14,15,16",..: 2 3 1 4
# $ zoneId : int 232917 232909 232914 232917
# $ zoneName : Factor w/ 3 levels "Club","Middle",..: 3 1 2 3
# $ listingAttributeList : Factor w/ 4 levels "204,201","204,4369",..: 1 3 4 2
# $ listingAttributeCategoryList: int 1 1 1 1
# $ deliveryTypeList : int 5 5 5 5
# $ dirtyTicketInd : logi FALSE FALSE FALSE FALSE
# $ splitOption : int 0 0 1 1
# $ ticketSplit : int 1 2 2 1
# $ splitVector : Factor w/ 3 levels "1","2","2,4": 1 2 3 1
# $ sellerOwnInd : int 0 0 0 0
# $ currentPrice.amount : num 469 475 475 550
# $ currentPrice.currency : Factor w/ 1 level "USD": 1 1 1 1
# $ faceValue.amount : logi NA NA NA NA
# $ faceValue.currency : logi NA NA NA NA
Upvotes: 1