Ochetski
Ochetski

Reputation: 115

R data.frame to JSON where each data point has column name and value

Hard to describe exactly in one title but what I'm trying to use an API and need data in a very specific Json format.

The data I'm using is simple:

     debtinc loan mortdue  value
6   37.11361 1700   30548  40320
8   36.88489 1800   28502  43034
20  31.58850 2300  102370 120953
26  38.26360 2400   34863  47471

#data sample:
df <- structure(list(debtinc = c(37.113613558, 36.884894093, 31.588503178, 
    38.263600731, 29.681827045, 30.051136286), loan = c(1700L, 1800L, 
    2300L, 2400L, 2400L, 2900L), mortdue = c(30548, 28502, 102370, 
    34863, 98449, 103949), value = c(40320, 43034, 120953, 47471, 
    117195, 112505)), row.names = c(6L, 8L, 20L, 26L, 27L, 35L), class = "data.frame")

And I am trying to transform in this json format. where each row is a single json object in a specific format:

# col1_row1 = numeric
# col1_row2 = char
 
{"inputs": [
               {"name": "<col1_name>", "value": <col1_row1>}, \
               {"name": "<col2_name>", "value": "<col2_row1>"}, \
               {"name": "<col3_name>"} \ # if Value: NA
               ]
}

my real struggle is transforming each column to a "name": "column1" and "value": "col1_val1"

My closes trial:

library("jsonlite")

df_list<- unname(split(test, 1:nrow(test)))

lapply(df_list, function(x) toJSON(list(inputs = x)))


Which gets me this, which is in object type I need, but not in the correct json format columns are key rather than value:

[1] "{\"inputs\":[{\"debtinc\":37.1136,\"loan\":1700,\"mortdue\":30548,\"value\":40320}]}"  
[2] "{\"inputs\":[{\"debtinc\":36.8849,\"loan\":1800,\"mortdue\":28502,\"value\":43034}]}"  
[3] "{\"inputs\":[{\"debtinc\":31.5885,\"loan\":2300,\"mortdue\":102370,\"value\":120953}]}"
[4] "{\"inputs\":[{\"debtinc\":38.2636,\"loan\":2400,\"mortdue\":34863,\"value\":47471}]}"  
[5] "{\"inputs\":[{\"debtinc\":29.6818,\"loan\":2400,\"mortdue\":98449,\"value\":117195}]}" 
[6] "{\"inputs\":[{\"debtinc\":30.0511,\"loan\":2900,\"mortdue\":103949,\"value\":112505}]}"

Upvotes: 0

Views: 193

Answers (2)

Ochetski
Ochetski

Reputation: 115

[Edit] After a year, I've stumbled on this again, and I've overhauled, more reliable and faster than before. If the Value is NA it will return only "name", this work as intended for the api the output payload is aimed to.


write_json_row <- function(row) {

    var_vect <- paste0('{"name": "', colnames(row), '"', 
                      ifelse(sapply(row, is.na), '', 
                      ifelse(sapply(row, is.character), paste0(', "value": ', '"', row, '"'), 
                      ## the "error" string output may give a silent error
                      ## it is here for placeholder because if I add stop() it
                      ## will halt the function, even though it is never returning "error"
                      ifelse(sapply(row, is.numeric), paste0(', "value": ', row), "error"))), ' } ')
    
    out <- paste0('{"inputs": [ ', paste0(var_vect, collapse = ", "), '] }')
    
    return(out)
    
    
}

## apply the function to each row of a DF

out <- by(df, seq_len(nrow(df)), write_json_row)
out

[Before Edit] As an addition to the previous answer, it didn't cover the case when you had character and numeric variables, pivot_longer would fail (my mistake with the data sample).

To fix, I would transform everything to character and then strip the quotes where there is number before or after. This is a poor regex, but there is a way.

library(dplyr)
library(tidyr)

df <- sapply(df, as.character)
df <- as.data.frame(df) 

df %>%
  mutate(id = row_number()) %>%  
  tidyr::pivot_longer(cols = -id) %>%   
  split(x = .[-1], f = .$id) %>%  
  unname %>% 
  lapply(function(x) toJSON(list(inputs = x))) %>%
  unlist %>%
  gsub(r'{"(?=\d)|(?<=\d)"}', "", ., perl = TRUE)

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145785

Pivoting to long format makes this work. Convieniently pivot_longer defaults to "name" and "value" as column names:

library(dplyr)
library(tidyr)
df %>%
  mutate(id = row_number()) %>%
  pivot_longer(cols = -id) %>%
  split(x = .[-1], f = .$id) %>%
  unname %>%
  jsonlite::toJSON(pretty = T)
# [
#   [
#     {
#       "name": "debtinc",
#       "value": 37.1136
#     },
#     {
#       "name": "loan",
#       "value": 1700
#     },
#     {
#       "name": "mortdue",
#       "value": 30548
#     },
#     {
#       "name": "value",
#       "value": 40320
#     }
#   ],
#   [
#     {
#       "name": "debtinc",
#       "value": 36.8849
#     },
#  ...

It's at least a lot closer - from your example I'm having trouble telling exactly where the nesting and "inputs:" need to go.

Upvotes: 1

Related Questions