Delist each row in a dataframe using apply function

Question

I have as input a file named in.json. You can find the content of this file here

Using this answer I try to convert json to csv with this code:

require(RJSONIO)
require(rjson)
library("rjson")
filename2 <- "C:/Users/Desktop/in.json"
json_data <- fromJSON(file = filename2)

json_data <- lapply(json_data, function(x) {
  x[sapply(x, is.null)] <- NA
  unlist(x)
})

json <- do.call("rbind", json_data)

df=json


write.csv(df,file='C:/Users/Desktop/final.csv', row.names=FALSE)

However when I type nrow(df) I can see I have only 2 rows but according to every id of project I have to more rows.

cmbarbu · Accepted Answer

The json you provide as an example indeed has only two objects in an array. The structure is faithfully shown by a called to str:

> str(json_data,max.level=2)
List of 2
 $ :List of 3
  ..$ projects  :List of 1
  ..$ total_hits: num 12596
  ..$ seed      : chr "776766" 
 $ :List of 3
  ..$ projects  :List of 16
  ..$ total_hits: num 12596
  ..$ seed      : chr "776766"

Guessing that you mean project id, and that you don't mind to loose the "total_hits" and you simply need to unlist the first two levels of the json:

 unlisted <- unlist(unlist(json_data,recursive=FALSE),recursive=FALSE)

And then select the items named projects*:

 projects <- unlisted[grep("^projects*",names(unlisted))]

You can then simply unlist using:

data <- lapply(projects,unlist)

Rbinding is more tricky as you do not have exactly the same fields filled in all projects, you need to rely on the names, the following is one of the many solutions, and probably not the optimal one:

# list all the names in all projects
allNames <- unique(unlist(lapply(data,names)))
# have a model row
modelRow <- rep(NA,length(allNames))
names(modelRow)<-allNames

# the function to change your list into a row  following modelRow structure
rowSettingFn <- function(project){
    row <- modelRow
    for(iItem in 1:length(project)){
        row[names(project)[iItem]] <- project[[iItem]]
    }
    return(row)
}

# change your data into a matrix
dataMat <- sapply(data,rowSettingFn)

Delist each row in a dataframe using apply function

Answers (1)

Related Questions