Aditya Salapaka
Aditya Salapaka

Reputation: 335

How can I convert these nested loops into an R loop function like sapply or tapply or

I have this code which I run over a data frame t.

for (i in years){
    for (j in type){
            x <- rbind(x, cbind(i, j, 
                       sum(t[(t$year == i) & (t$type == j),]$Emissions, 
                           na.rm = TRUE)))
}
}

Basically, I have two vectors years and type. I'm finding the sum of each category and merging that into a data frame. The above code works, but I cannot figure out how to use one of the loop functions.

Upvotes: 0

Views: 82

Answers (2)

farnsy
farnsy

Reputation: 2470

Yes, there are ways to do this using the apply functions. I'm going to suggest a high performance approach using dplyr, though.

library(dplyr)
x <- t %>% 
     group_by(year,type) %>% 
     summarize(SumEmmissions=sum(Emissions,na.rm=TRUE)) 

I think you will find that it is much faster than either a loop or apply approach.

=================== Proof, as requested ===============

library(dplyr) N <- 1000000 Nyear <- 50 Ntype <- 40 myt <- data.frame(year=sample.int(50,N,replace=TRUE), type=sample.int(4,N,replace=TRUE), Emissions=rnorm(N) ) years <- 1:Nyear type <- 1:Ntype v1 <- function(){ x <- myt %>% group_by(year,type) %>% summarize(SumEmmissions=sum(Emissions,na.rm=TRUE)) } v2 <- function(){ x <- data.frame() for (i in years){ for (j in type){ x <- rbind(x, cbind(i, j, sum(myt[(myt$year == i) & (myt$type == j),]$Emissions, na.rm = TRUE))) } } } v3 <- function(){ t0 <- myt[myt$year %in% years & myt$type %in% type, ] x <- aggregate(Emissions ~ year + type, t0, sum, na.rm = TRUE) } system.time(v1()) user system elapsed 0.051 0.000 0.051 system.time(v2()) user system elapsed 176.482 0.402 177.231 system.time(v3()) user system elapsed 7.758 0.011 7.783

As the sizes and number of groups increases, so does the performance spread.

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269734

Pick out all rows for which year is in years and type is in type giving t0. Then aggregate Emissions based on years and type.

t0 <- t[t$year %in% years & t$type %in% type, ]
aggregate(Emissions ~ year + type, t0, sum, na.rm = TRUE)

If the years and type vectors contain all years and types then the first line could be omitted and t0 in the second line replaced with t.

Next time please make your example reproducible.

Update Some corrections.

Upvotes: 1

Related Questions