Reputation: 335
I have this code which I run over a data frame t.
for (i in years){
for (j in type){
x <- rbind(x, cbind(i, j,
sum(t[(t$year == i) & (t$type == j),]$Emissions,
na.rm = TRUE)))
}
}
Basically, I have two vectors years
and type
. I'm finding the sum of each category and merging that into a data frame. The above code works, but I cannot figure out how to use one of the loop functions.
Upvotes: 0
Views: 82
Reputation: 2470
Yes, there are ways to do this using the apply functions. I'm going to suggest a high performance approach using dplyr, though.
library(dplyr)
x <- t %>%
group_by(year,type) %>%
summarize(SumEmmissions=sum(Emissions,na.rm=TRUE))
I think you will find that it is much faster than either a loop or apply approach.
=================== Proof, as requested ===============
library(dplyr)
N <- 1000000
Nyear <- 50
Ntype <- 40
myt <- data.frame(year=sample.int(50,N,replace=TRUE),
type=sample.int(4,N,replace=TRUE),
Emissions=rnorm(N)
)
years <- 1:Nyear
type <- 1:Ntype
v1 <- function(){
x <- myt %>%
group_by(year,type) %>%
summarize(SumEmmissions=sum(Emissions,na.rm=TRUE))
}
v2 <- function(){
x <- data.frame()
for (i in years){
for (j in type){
x <- rbind(x, cbind(i, j,
sum(myt[(myt$year == i) & (myt$type == j),]$Emissions, na.rm = TRUE)))
}
}
}
v3 <- function(){
t0 <- myt[myt$year %in% years & myt$type %in% type, ]
x <- aggregate(Emissions ~ year + type, t0, sum, na.rm = TRUE)
}
system.time(v1())
user system elapsed
0.051 0.000 0.051
system.time(v2())
user system elapsed
176.482 0.402 177.231
system.time(v3())
user system elapsed
7.758 0.011 7.783
As the sizes and number of groups increases, so does the performance spread.
Upvotes: 2
Reputation: 269734
Pick out all rows for which year
is in years
and type
is in type
giving t0
. Then aggregate Emissions
based on years
and type
.
t0 <- t[t$year %in% years & t$type %in% type, ]
aggregate(Emissions ~ year + type, t0, sum, na.rm = TRUE)
If the years
and type
vectors contain all years and types then the first line could be omitted and t0
in the second line replaced with t
.
Next time please make your example reproducible.
Update Some corrections.
Upvotes: 1