Reputation: 195
I have a list containing Dates ("anchor_dates") and a data frame containing results by Group and Test Date ("groups").
anchor_dates <- as.Date(c("2015-07-20","2015-07-21","2015-07-22"))
set.seed(3)
groups <- data.frame(Test.Date = as.Date(c(rep("2015-07-18", 3), rep("2015-07-19", 3), rep("2015-07-20", 3), rep("2015-07-21", 3))),
Group = rep(c("AAA","BBB","CCC"), 4), Var1 = round(runif(12,0,10), ), Var2 = round(runif(12,0,7)))
> head(groups)
Test.Date Group Var1 Var2
1 2015-07-18 AAA 2 4
2 2015-07-18 BBB 8 4
3 2015-07-18 CCC 4 6
4 2015-07-19 AAA 3 6
5 2015-07-19 BBB 6 1
6 2015-07-19 CCC 6 5
I need to use the Dates in the "anchor_dates" list as anchor points in the "groups" set and aggregate variables by Group from the previous two Test Dates before the Anchor Date. There may not always be a result on each Test Date for a given Group, so I can't use a subset() subtracting the Anchor Date by 1 and 2. I need to be able to pull the last two Test Dates for each Group before the Anchor Date regardless of how far back they are and non sequential.
The following gets me close, however when I try to
unsplit(temp, groups$Group)
after the aggregation, the return is a flattened set which has something wrong with repeating the same Var sums and doesn't allow me to use Map() on the set afterwards adding in the anchor Date from the "anchor_dates" list.
f <- lapply(anchor_dates, function(x) {
lapply(split(groups, groups$Group), function(y) {
temp <- tail(y[order(y$Date == x), ], 2)
temp <- aggregate(cbind(Var1, Var2) ~ Group, data = temp, FUN = sum)
})
})
[[1]]
[[1]]$AAA
Group Var1 Var2
1 AAA 7 6
[[1]]$BBB
Group Var1 Var2
1 BBB 8 3
[[1]]$CCC
Group Var1 Var2
1 CCC 11 3
..............
The final result should instead be returned like below (or comparable solution)
[[1]]
Group Var1 Var2
1 AAA 5 10
2 BBB 14 5
3 CCC 10 11
[[2]]
Group Var1 Var2
1 AAA 4 12
2 BBB 9 3
3 CCC 12 7
[[3]]
Group Var1 Var2
1 AAA 7 6
2 BBB 8 3
3 CCC 11 3
Which allows me to end up with the following
f1 <- Map(cbind, f, anchor_dates)
do.call(rbind, f1)
Group Var1 Var2 Anchor.Date
1 AAA 5 10 2015-07-20
2 BBB 14 5 2015-07-20
3 CCC 10 11 2015-07-20
4 AAA 4 12 2015-07-21
5 BBB 9 3 2015-07-21
6 CCC 12 7 2015-07-21
7 AAA 7 6 2015-07-22
8 BBB 8 3 2015-07-22
9 CCC 11 3 2015-07-22
Upvotes: 3
Views: 161
Reputation: 35314
`rownames<-`(do.call(rbind,by(groups,groups$Group,function(g)
do.call(rbind,lapply(anchor_dates,function(anc) {
befores <- which(g$Test.Date<anc);
twobefore <- befores[order(anc-g$Test.Date[befores])[1:2]];
cbind(aggregate(.~Group,g[twobefore,names(g)!='Test.Date'],sum),Anchor.Date=anc);
}))
)),NULL);
## Group Var1 Var2 Anchor.Date
## 1 AAA 5 10 2015-07-20
## 2 AAA 4 12 2015-07-21
## 3 AAA 7 6 2015-07-22
## 4 BBB 14 5 2015-07-20
## 5 BBB 9 3 2015-07-21
## 6 BBB 8 3 2015-07-22
## 7 CCC 10 11 2015-07-20
## 8 CCC 12 7 2015-07-21
## 9 CCC 11 3 2015-07-22
Upvotes: 1
Reputation: 6749
I did this using a function with another function inside it. The outside function is suitable to be called using by()
, with subsetted data frames, while the internal one lets us examine multiple anchor dates.
func.get_agg_values <- function(df.groupdata,list_of_anchor_dates) {
df.returndata <- lapply(X = list_of_anchor_dates,
active.group.df = df.groupdata,
FUN = function(anchor.date,active.group.df) {
# Get order of the data frame in a proper order
active.group.df <- active.group.df[order(active.group.df$Test.Date,decreasing = TRUE),]
# Next, we subset active.group.df to those rows that are before the anchor date
# Since it was ordered, we can just take 1 and 2 as the last two dates before the anchor date
active.group.df <- active.group.df[as.numeric(active.group.df$Test.Date - anchor.date) < 0,][1:2,]
# Finally, get the sums and return a data frame
returned.row.df <- data.frame(Group = unique(active.group.df$Group),
Var1 = sum(active.group.df$Var1),
Var2 = sum(active.group.df$Var2),
Anchor.Date = anchor.date)
return(returned.row.df)
})
return(do.call(what = rbind.data.frame,
args = df.returndata))
}
f1 <- do.call(what = rbind.data.frame,
args = by(data = groups,
INDICES = groups$Group,
FUN = func.get_agg_values,
list_of_anchor_dates = anchor_dates))
> f1
Group Var1 Var2 Anchor.Date
AAA.1 AAA 5 10 2015-07-20
AAA.2 AAA 4 12 2015-07-21
AAA.3 AAA 7 6 2015-07-22
BBB.1 BBB 14 5 2015-07-20
BBB.2 BBB 9 3 2015-07-21
BBB.3 BBB 8 3 2015-07-22
CCC.1 CCC 10 11 2015-07-20
CCC.2 CCC 12 7 2015-07-21
CCC.3 CCC 11 3 2015-07-22
Upvotes: 2