Reputation: 11
ID Year Firm Score
1 2005 A 2
1 2006 A 5
1 2006 B 1
1 2007 A 36
1 2007 E 69
1 2008 E 8
1 2008 B 54
1 2009 A 25
1 2009 C 2
1 2010 E 2
1 2010 B 2
1 2011 A 5
1 2011 B 5
1 2012 A 4
1 2012 B 1
In above data, I want to perform group by 5 year rolling sum of 'Score' for each individual (ID) conditional on the current year Firm the person is working in. Let me explain this by example. Suppose I want rolling sum of 'Score' variable for year 2009, it should first check the firms in which the person (ID) is working. In 2009, person is working in A and C. It should then calculate 5 year rolling sum of 'Score' only for Firms A or C. Output of 5 year rolling sum for year 2009 will be (2 (for year 2005 firm A) + 5 ((for year 2006 firm A)) + 36 (for year 2007 firm A) + 27 (for year 2009 firm A and C) ) = 70. [Note: Year 2008 is ignored because person is neither registered in firm A nor firm C]
I also want to perform Rolling Average on similar lines. [Note: Original data has around 30 million observations]
Upvotes: 0
Views: 529
Reputation: 7526
rs <- as.data.frame(matrix(nrow =15, ncol = 4))
colnames(rs) <- c("ID", "Year", "Firm", "Score")
rs$ID <- 1
rs$Year <- c(2005,
2006,
2006,
2007,
2007,
2008,
2008,
2009,
2009,
2010,
2010,
2011,
2011,
2012,
2012)
rs$Firm <- c("A", "A", "B", "A", "E",
"E", "B", "A", "C", "E",
"B", "A", "B", "A", "B")
rs$Score <- c(2, 5, 1, 36, 69, 8,
54, 25, 2, 2, 2, 5, 5, 4,
1)
a <- rs$Year
for(i in unique(a)){
b <- rs[rs$Year == i,]
c <- (b$Firm)
d <- rs[rs$Year <= i & rs$Firm %in% c,]
print(paste(i, sum(d$Score)))
}
Output:
[1] "2005 2"
[1] "2006 8"
[1] "2007 112"
[1] "2008 132"
[1] "2009 70"
[1] "2010 136"
[1] "2011 135"
[1] "2012 140"
Upvotes: 1