Bob Jansen
Bob Jansen

Reputation: 1287

Group by while calculating multiple values

I'm trying to use a function that calculates some statistics by group in a data.table as follows:

minmax <- function(vec) {
  c(min(vec), max(vec))
}
library(data.table)
iris <- as.data.table(iris)
iris[, c('Min', 'Max') := minmax(Petal.Length), by = Species]

The result should be the min and max Petal.Length by Species and have as many rows as there are Species. That is, the same result as the code below:

merge(
  iris[, .(Min = min(Petal.Length)), Species],
  iris[, .(Max = max(Petal.Length)), Species],
  on = 'Species'
)

      Species Min Max
1:     setosa 1.0 1.9
2: versicolor 3.0 5.1
3:  virginica 4.5 6.9

Note: in my own code I want to do this in one go, not using merge().

Upvotes: 0

Views: 58

Answers (1)

r2evans
r2evans

Reputation: 160407

There are a few things here:

  1. The := is for adding columns to an existing frame, so it will not summarize as you've demonstrated. That is, DT[, a := b] should always have exactly the same number of rows. I think this is not what you need.

  2. You can do multiple-assignment in your summarizing code and do away with the merge (perhaps this is similar to the code you said you do "in one go"):

    iris[, .(Min = min(Petal.Length), Max = max(Petal.Length)), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
  3. But finally, you are asking how to use your function to get this. First attempts might be:

    minmax <- function(vec) c(min(vec), max(vec))
    iris[, minmax(Petal.Length), by = .(Species)]
    #       Species    V1
    #        <fctr> <num>
    # 1:     setosa   1.0
    # 2:     setosa   1.9
    # 3: versicolor   3.0
    # 4: versicolor   5.1
    # 5:  virginica   4.5
    # 6:  virginica   6.9
    
    iris[, as.list(minmax(Petal.Length)), by = .(Species)]
    #       Species    V1    V2
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
    iris[, setNames(as.list(minmax(Petal.Length)), c("Min", "Max")), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
    minmax <- function(vec) c(Min = min(vec), Max = max(vec))
    iris[, as.list(minmax(Petal.Length)), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    

    So we can change the function to return a list (optionally named).

    minmax <- function(vec) list(min(vec), max(vec))
    iris[, minmax(Petal.Length), by = .(Species)]
    #       Species    V1    V2
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    iris[, setNames(minmax(Petal.Length), c("Min", "Max")), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
    minmax <- function(vec) list(Min = min(vec), Max = max(vec))
    iris[, minmax(Petal.Length), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    

Upvotes: 3

Related Questions