thiagoveloso
thiagoveloso

Reputation: 2763

Assign categories from JSON column in data table

I have a data.table that looks like dt (provided at the end), which includes columns EI and CutsLabsCV.

CutsLabsCV is a JSON of two values indicating cuts and labels that should be used for classifying my data set.

I need to create a new column EIRange, that classifies the EI column based on cuts and labels provided on CutsLabsCV.

The expected result should be:

          EI                                    CutsLabsCV EIRange
       <num>                                        <list>  <fctr>
 1: 101.9163 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 2: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 3: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 4: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 5: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 6: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 7: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 8: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 9: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
10: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120

How can I achieve this in data.table? I need the most efficient solution since my real dataset exceeds 2M rows.

Sample dataset:

dt <- data.table(EI = c(101.91625, 122.03178865, 122.03178865,
                        122.03178865, 122.03178865, 122.03178865, 122.03178865, 122.03178865,
                        109.521980125, 109.521980125),
                 CutsLabsCV = list(c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",85,120,\"Inf\"]", "[\"<85\",\"85-120\",\">120\"]"),
                                   c("[\"-Inf\",85,120,\"Inf\"]", "[\"<85\",\"85-120\",\">120\"]")))

Upvotes: 4

Views: 85

Answers (2)

Friede
Friede

Reputation: 7979

Avoiding sapply(), unlist() + matrix() + sub() might be an alternative

dt[, EIRange := sub('.*"(\\d+-\\d+)".*', '\\1', matrix(unlist(CutsLabsCV, FALSE, FALSE), ncol = 2L, byrow = TRUE)[, 2L])]
> dt 
      EI                                        CutsLabsCV  EIRange
       <num>                                        <list>  <char>
 1: 101.9163 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 2: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 3: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 4: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 5: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 6: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 7: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 8: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 9: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
10: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120

Upvotes: 2

Waldi
Waldi

Reputation: 41260

One possible solution :

dt[,EIRange:=sapply(CutsLabsCV,\(x) jsonlite::fromJSON(x[2])[2])]
dt

#           EI                                    CutsLabsCV EIRange
#        <num>                                        <list>  <char>
#  1: 101.9163 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  2: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  3: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  4: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  5: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  6: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  7: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  8: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  9: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
# 10: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120

Upvotes: 3

Related Questions