hans glick
hans glick

Reputation: 2611

In R, aggregate statistics with data.table when the group by variable is a string

I got this data.table set :

set.seed(1234)
r=100
ref=c("banana","chocolate","apple")
ref2=c("florida","california")
products=sample(ref,r,TRUE)
cities=sample(ref2,r,TRUE)
quantity=sample(seq(1:5),100,TRUE)
dt=data.table(products,quantity,cities)
head(dt,5)

  products quantity     cities
1:    apple        3    florida
2:   banana        5    florida
3:   banana        3    florida
4:    apple        2 california
5:    apple        5    Florida

I want a function which take the index of the group by variable as an argument. Like this :

sumfn=function(dt,index){
  var_group=c("products","cities")[index]
  res=dt[,list(Q=sum(quantity)),by=list(var_group)]
  return(res)
}

So that, if I ran this command :

count_dt=sumfn(dt,1)

My output would be :

    products   Q
1: chocolate  72
2:    banana  91
3:     apple 136

However, when I do run the command above, instead, I get this error message :

Error in `[.data.table`(dt, , list(Q = sum(quantity)), by = list(var_group)) : 
  The items in the 'by' or 'keyby' list are length (1). Each must be same length as rows in x or number of rows returned by i (100).

Anyone know how to fix that? I'm stuck with this problem for an hour.

Upvotes: 0

Views: 103

Answers (1)

Uwe
Uwe

Reputation: 42544

Remove list in the byclause, i.e.,

sumfn <- function(dt, index) {
    var_group <- c("products", "cities")[index]
    res <- dt[, list(Q = sum(quantity)), by = var_group]
    return(res)
}

and you will get

sumfn(dt,1)
    products   Q
1:     apple 116
2:    banana  83
3: chocolate 102

Upvotes: 2

Related Questions