Reputation: 91
I have a dataframe with ID, type, and the area I would like to perform two operations at once
ID Type Area
1 Aa 0.02
2 Ag 0.12
2 Ag 0.14
2 Ag 0.80
2 Bm 0.20
2 Xm 0.13
The expected outcome is
ID Type count area
1 Aa 1 0.02
2 Ag 3 1.06 (sum)
2 Bm 1 0.20
2 Xm 1 0.13
I have up to 100-150 ID and type, count and basal area varies for each type with the same ID, what would be the best approach to perform sum and count same time and keep type and ID in dataframe?
Thanks
Upvotes: 4
Views: 10254
Reputation: 269371
1) Base R -- aggregate Counts are just the sum of a constant column of ones so using DF
shown reproducibly in the Note at the end we add such a column and aggregate using sum
. No packages are used.
aggregate(cbind(Count, Area) ~ ID + Type, transform(DF, Count = 1), sum)
giving:
ID Type Count Area
1 1 Aa 1 0.02
2 2 Ag 3 1.06
3 2 Bm 1 0.20
4 2 Xm 1 0.13
2) Base R -- by An approach using only base R that does not rely on the trick of adding a column of ones is to use by
. The by
call produces a list of class by
and the do.call("rbind", ...)
converts that to a data frame.
do.call("rbind", by(DF, DF[1:2], with,
data.frame(ID = ID[1], Type = Type[1], Count = length(ID), Area = sum(Area))))
giving:
ID Type Count Area
1 1 Aa 1 0.02
2 2 Ag 3 1.06
3 2 Bm 1 0.20
4 2 Xm 1 0.13
3) sqldf SQL allows the separate and simultaneous application of count and sum.
library(sqldf)
sqldf("select ID, Type, count(*) as Count, sum(Area) as Area
from DF
group by 1, 2")
giving:
ID Type Count Area
1 1 Aa 1 0.02
2 2 Ag 3 1.06
3 2 Bm 1 0.20
4 2 Xm 1 0.13
4) data.table The data.table package can also be used.
library(data.table)
DT <- as.data.table(DF)
DT[, .(Count = .N, Area = sum(Area)), by = "ID,Type"]
giving:
ID Type Count Area
1: 1 Aa 1 0.02
2: 2 Ag 3 1.06
3: 2 Bm 1 0.20
4: 2 Xm 1 0.13
Lines <- "ID Type Area
1 Aa 0.02
2 Ag 0.12
2 Ag 0.14
2 Ag 0.80
2 Bm 0.20
2 Xm 0.13 "
DF <- read.table(text = Lines, header = TRUE)
Upvotes: 7
Reputation: 213
If your data is large, I recommend data.table
:
library(data.table)
setDT(df)[, .(Area=sum(Area), Count=.N), .(ID, Type)]
Upvotes: 2
Reputation: 28826
Another possibility in plyr
:
library(plyr)
ddply(DF, .(ID,Type), summarize, Count=length(Area), Area=sum(Area))
# ID Type Count Area
# 1 1 Aa 1 0.02
# 2 2 Ag 3 1.06
# 3 2 Bm 1 0.20
# 4 2 Xm 1 0.13
Upvotes: 2
Reputation: 886938
We can use dplyr
. Packages are used
library(dplyr)
df1 %>%
group_by(ID, Type) %>%
summarise(count = n(), Area = sum(Area))
# A tibble: 4 x 4
# Groups: ID [2]
# ID Type count Area
# <int> <chr> <int> <dbl>
#1 1 Aa 1 0.02
#2 2 Ag 3 1.06
#3 2 Bm 1 0.2
#4 2 Xm 1 0.13
or with by
from base R
- Note that base R
includes some packages as well...
by(df1['Area'], df1[1:2], FUN = function(x) cbind(count = nrow(x), Area = sum(x)))
df1 <- structure(list(ID = c(1L, 2L, 2L, 2L, 2L, 2L), Type = c("Aa",
"Ag", "Ag", "Ag", "Bm", "Xm"), Area = c(0.02, 0.12, 0.14, 0.8,
0.2, 0.13)), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 6