data table subset using position and name

Question

I am trying to subset a data table by the numeric values so I can perform a five number summary on the numeric variables only. However, I also need to group the variables. The way that I was trying to do it does not allow me to use the subset and the id variable that is not part of the subset. I know that data table has the .SD command, but I cannot seem to get the right combination of apply functions and group in data table. The id variable is not numeric and cannot be coerced into being numeric; it is also not unique in my data table.

Here is what I have tried:

library(data.table)
library(magrittr)

dt <- data.table(num1 = rep(1, 10), 
      num2 = rep(2, 10), 
      num3 = rep(100, 10), 
      id = c("1a", "2b", "2h", "3b", "4b", "5b", "5b", "7n", "8mn", "9y"), 
      char1 = rep("a", 10), 
      char2 = rep("b", 10))

numeric_variables <- 
  lapply(dt, is.numeric) %>% 
  unlist() %>% 
  as.vector()

dt[, numeric_variables, with = FALSE]

dt_summary <- 
  apply(dt[, numeric_variables, with = FALSE][, grep("num", 
                                                 names(dt[, numeric_variables, with = FALSE]), 
                                                                                value = TRUE), 
                                          with = FALSE],  
 2,
 fivenum)  %>% 
 as.data.frame() 

rownames(dt_summary) <- 
  c("Min", "Q1", "Med", "Q3", "Max")

dt_summary

dt[, .(numeric_variables, id), with = FALSE]

The final line does not work because id is not in the numeric_variables category I created. If someone could direct me to using the correct by, tapply function with .SD I would appreciate it.

NOTE: This is part of a larger program where the user can either select one id to look at or compare two id variables at once. So, it needs to work for either one or many groups (eventually).

data table subset using position and name

Answers (1)

Related Questions