Reputation: 3720
I'm looking for a way to do simple aggregates / counts via data.table.
Consider the iris data, which has 50 observations per species. To count the observations per species I have to summaries over a column other than species, for example "Sepal.Length".
library(data.table)
dt = as.data.table(iris)
dt[,length(Sepal.Length), Species]
I find this confusing because it looks like I'm doing something on Sepal.Length at first glance, when really it's only Species that matters.
This is what I would prefer to say, but I don't get valid output:
dt[,length(Species), Species]
> dt[,length(Sepal.Length), Species]
Species V1
1: setosa 50
2: versicolor 50
3: virginica 50
> dt[,length(Species), Species]
Species V1
1: setosa 1
2: versicolor 1
3: virginica 1
Upvotes: 35
Views: 19462
Reputation: 620
A more generizable method is
dt[, table(col2bCount)%>%as.data.frame, by= .(col1,col2,col3,...)]
The key advantage is that columns col2bCount
and col1,2,3
can be different(or the same), meaning you can caculate the frequency in one column grouped by a different column(s).
Upvotes: 0
Reputation: 115505
data.table
has a couple of symbols that can be used within the j
expression. Notably
.N
will give you the number of number of rows in each group.see ?data.table
under the details for by
Advanced: When grouping by
by
or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.....
.N is an integer, length 1, containing the number of rows in the group.
For example:
dt[, .N ,by = Species]
Species N
1: setosa 50
2: versicolor 50
3: virginica 50
Upvotes: 39