ira
ira

Reputation: 2644

How to create column names form character vector when using data.table

I have a data.table like so:

dt = data.table(id_1 = c(rep(1:3, 5)), id_2 = sort(rep(c('A', 'B', 'C'), 5)), value_1 = rnorm(15, 1, 1), value_2 = rpois(15, 1))

I would like to create a function which groups the table by some columns specified by the function parameter and performs action (let's say sum) to several other columns specified by another parameter. Finally, i'd like to specify names for the new columns as another function parameter. My problem is: i dont really know how to create names from character vector when i am not using the assignment by reference :=.

The following two approaches achieve exactly what i want to do, i just don't like the way:

Approach one: use the assignment by reference and then choose only one record per group (and forget original columns)

dt_aggregator_1 <- function(data,
                          group_cols = c('id_1', 'id_2'),
                          new_names = c('sum_value_1', 'sum_value_2'),
                          value_cols = c('value_1', 'value_2')){
  data_out = data
  data_out[,(new_names) := lapply(.SD, function(x){sum(x)}),by = group_cols, .SDcols = value_cols]
  data_out[,lapply(.SD, max), by = group_cols, .SDcols = new_names]
}

Approach 2: rename columns after grouping. I assume this is way better approach.

dt_aggregator_2 <- function(data,
                            group_cols = c('id_1', 'id_2'),
                            new_names = c('sum_value_1', 'sum_value_2'),
                            value_cols = c('value_1', 'value_2')){
  data_out = data[,lapply(.SD, function(x){sum(x)}),by = group_cols, .SDcols = value_cols]
  setnames(data_out, value_cols, new_names)
  data_out[]
}

My question is, if in approach number 2 i can somehow set the names while performing the grouping opperation? So that i would reduce it to one line of code instead of 2:)

Upvotes: 0

Views: 697

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388862

You can include setNames in the same line and make this one-liner.

dt_aggregator_2 <- function(data,
                            group_cols = c('id_1', 'id_2'),
                            new_names = c('sum_value_1', 'sum_value_2'),
                            value_cols = c('value_1', 'value_2')){

  dt[,setNames(lapply(.SD, sum), new_names),by = group_cols, .SDcols = value_cols]

}

Upvotes: 1

Anil Kumar
Anil Kumar

Reputation: 445

you can try with dplyr library

library(dplyr)

dt1 <- dt %>% group_by(id_1,id_2) %>%
  summarise(
    sum_value_1 = sum(value_1),
    sum_value_2 = sum(value_2)
  )

dt1

Upvotes: 1

Related Questions