Reputation: 1287
I'm trying to use a function that calculates some statistics by group in a data.table
as follows:
minmax <- function(vec) {
c(min(vec), max(vec))
}
library(data.table)
iris <- as.data.table(iris)
iris[, c('Min', 'Max') := minmax(Petal.Length), by = Species]
The result should be the min and max Petal.Length
by Species
and have as many rows as there are Species. That is, the same result as the code below:
merge(
iris[, .(Min = min(Petal.Length)), Species],
iris[, .(Max = max(Petal.Length)), Species],
on = 'Species'
)
Species Min Max
1: setosa 1.0 1.9
2: versicolor 3.0 5.1
3: virginica 4.5 6.9
Note: in my own code I want to do this in one go, not using merge()
.
Upvotes: 0
Views: 58
Reputation: 160407
There are a few things here:
The :=
is for adding columns to an existing frame, so it will not summarize as you've demonstrated. That is, DT[, a := b]
should always have exactly the same number of rows. I think this is not what you need.
You can do multiple-assignment in your summarizing code and do away with the merge
(perhaps this is similar to the code you said you do "in one go"):
iris[, .(Min = min(Petal.Length), Max = max(Petal.Length)), by = .(Species)]
# Species Min Max
# <fctr> <num> <num>
# 1: setosa 1.0 1.9
# 2: versicolor 3.0 5.1
# 3: virginica 4.5 6.9
But finally, you are asking how to use your function to get this. First attempts might be:
minmax <- function(vec) c(min(vec), max(vec))
iris[, minmax(Petal.Length), by = .(Species)]
# Species V1
# <fctr> <num>
# 1: setosa 1.0
# 2: setosa 1.9
# 3: versicolor 3.0
# 4: versicolor 5.1
# 5: virginica 4.5
# 6: virginica 6.9
iris[, as.list(minmax(Petal.Length)), by = .(Species)]
# Species V1 V2
# <fctr> <num> <num>
# 1: setosa 1.0 1.9
# 2: versicolor 3.0 5.1
# 3: virginica 4.5 6.9
iris[, setNames(as.list(minmax(Petal.Length)), c("Min", "Max")), by = .(Species)]
# Species Min Max
# <fctr> <num> <num>
# 1: setosa 1.0 1.9
# 2: versicolor 3.0 5.1
# 3: virginica 4.5 6.9
minmax <- function(vec) c(Min = min(vec), Max = max(vec))
iris[, as.list(minmax(Petal.Length)), by = .(Species)]
# Species Min Max
# <fctr> <num> <num>
# 1: setosa 1.0 1.9
# 2: versicolor 3.0 5.1
# 3: virginica 4.5 6.9
So we can change the function to return a list (optionally named).
minmax <- function(vec) list(min(vec), max(vec))
iris[, minmax(Petal.Length), by = .(Species)]
# Species V1 V2
# <fctr> <num> <num>
# 1: setosa 1.0 1.9
# 2: versicolor 3.0 5.1
# 3: virginica 4.5 6.9
iris[, setNames(minmax(Petal.Length), c("Min", "Max")), by = .(Species)]
# Species Min Max
# <fctr> <num> <num>
# 1: setosa 1.0 1.9
# 2: versicolor 3.0 5.1
# 3: virginica 4.5 6.9
minmax <- function(vec) list(Min = min(vec), Max = max(vec))
iris[, minmax(Petal.Length), by = .(Species)]
# Species Min Max
# <fctr> <num> <num>
# 1: setosa 1.0 1.9
# 2: versicolor 3.0 5.1
# 3: virginica 4.5 6.9
Upvotes: 3