S Front
S Front

Reputation: 353

what is the most efficient way to sum over all observations in previous years

I have a similar data set to the following

For each unique year, I would like to sum over all treaties each unique id in the data set has. What is the best approach for such a problem? My dataset is much larger than the example here. I have heard that for loops are inefficient to do this.

id <- rep(1:3, 4)
year <- (c(c(rep(1990, 3)), c(rep(1991, 3)), c(rep(1992, 3)), c(rep(1993, 3))) )
treaty <- c(1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1 )
data <- cbind(id, year, treaty)

What I have tried is something like this, but I haven't managed that it works. Could I also do this with the plyr or dplyr package?

for (i in unique(data$year)){
for (j in unique(data$year)){
  prior_temp <- sum(dat$treaty[which(dat$id==j & yearunique[1:i])])
 }
   dat$prior <- prior_temp
}

Upvotes: 0

Views: 49

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269694

If by prior you mean the current and prior years for that id then:

transform(data, prior = ave(treaty, id, FUN = cumsum))

or if you really mean prior years for that id then:

transform(data, prior = ave(treaty, id, FUN = cumsum) - treaty)

With dplyr it would be this or replace cumsum(treaty) with cumsum(treaty) - treaty if you really mean prior.

library(dplyr)

data %>%
  as.data.frame %>%
  group_by(id) %>%
  mutate(prior = cumsum(treaty)) %>%
  ungroup

Upvotes: 1

Related Questions