dc3
dc3

Reputation: 188

data table subset using position and name

I am trying to subset a data table by the numeric values so I can perform a five number summary on the numeric variables only. However, I also need to group the variables. The way that I was trying to do it does not allow me to use the subset and the id variable that is not part of the subset. I know that data table has the .SD command, but I cannot seem to get the right combination of apply functions and group in data table. The id variable is not numeric and cannot be coerced into being numeric; it is also not unique in my data table.

Here is what I have tried:

library(data.table)
library(magrittr)

dt <- data.table(num1 = rep(1, 10), 
      num2 = rep(2, 10), 
      num3 = rep(100, 10), 
      id = c("1a", "2b", "2h", "3b", "4b", "5b", "5b", "7n", "8mn", "9y"), 
      char1 = rep("a", 10), 
      char2 = rep("b", 10))

numeric_variables <- 
  lapply(dt, is.numeric) %>% 
  unlist() %>% 
  as.vector()

dt[, numeric_variables, with = FALSE]

dt_summary <- 
  apply(dt[, numeric_variables, with = FALSE][, grep("num", 
                                                 names(dt[, numeric_variables, with = FALSE]), 
                                                                                value = TRUE), 
                                          with = FALSE],  
 2,
 fivenum)  %>% 
 as.data.frame() 

rownames(dt_summary) <- 
  c("Min", "Q1", "Med", "Q3", "Max")

dt_summary

dt[, .(numeric_variables, id), with = FALSE]

The final line does not work because id is not in the numeric_variables category I created. If someone could direct me to using the correct by, tapply function with .SD I would appreciate it.

NOTE: This is part of a larger program where the user can either select one id to look at or compare two id variables at once. So, it needs to work for either one or many groups (eventually).

Upvotes: 2

Views: 171

Answers (1)

Rentrop
Rentrop

Reputation: 21507

i guess you are looking for .SDcols

ind  <- sapply(dt, is.numeric)
(dt_summary <- dt[,lapply(.SD, fivenum), .SDcols = ind])

Which gives you

   num1 num2 num3
1:    1    2  100
2:    1    2  100
3:    1    2  100
4:    1    2  100
5:    1    2  100

As data.table does not accept rownames you can do:

setDF(dt_summary)
rownames(dt_summary) <- 
  c("Min", "Q1", "Med", "Q3", "Max")

> dt_summary
    num1 num2 num3
Min    1    2  100
Q1     1    2  100
Med    1    2  100
Q3     1    2  100
Max    1    2  100

Upvotes: 3

Related Questions