Brian
Brian

Reputation: 195

Aggregating based on split groups and previous time periods in R

I have a list containing Dates ("anchor_dates") and a data frame containing results by Group and Test Date ("groups").

anchor_dates <- as.Date(c("2015-07-20","2015-07-21","2015-07-22"))
set.seed(3)
groups <- data.frame(Test.Date = as.Date(c(rep("2015-07-18", 3), rep("2015-07-19", 3), rep("2015-07-20", 3), rep("2015-07-21", 3))), 
              Group = rep(c("AAA","BBB","CCC"), 4), Var1 = round(runif(12,0,10), ), Var2 = round(runif(12,0,7)))

> head(groups)
    Test.Date Group Var1 Var2
1  2015-07-18   AAA    2    4
2  2015-07-18   BBB    8    4
3  2015-07-18   CCC    4    6
4  2015-07-19   AAA    3    6
5  2015-07-19   BBB    6    1
6  2015-07-19   CCC    6    5

I need to use the Dates in the "anchor_dates" list as anchor points in the "groups" set and aggregate variables by Group from the previous two Test Dates before the Anchor Date. There may not always be a result on each Test Date for a given Group, so I can't use a subset() subtracting the Anchor Date by 1 and 2. I need to be able to pull the last two Test Dates for each Group before the Anchor Date regardless of how far back they are and non sequential.

The following gets me close, however when I try to

unsplit(temp, groups$Group)

after the aggregation, the return is a flattened set which has something wrong with repeating the same Var sums and doesn't allow me to use Map() on the set afterwards adding in the anchor Date from the "anchor_dates" list.

f <- lapply(anchor_dates, function(x) {
    lapply(split(groups, groups$Group), function(y) {
        temp <- tail(y[order(y$Date == x), ], 2)
        temp <- aggregate(cbind(Var1, Var2) ~ Group, data = temp, FUN = sum)
     })
})

[[1]]
[[1]]$AAA
  Group Var1 Var2
1   AAA    7    6

[[1]]$BBB
  Group Var1 Var2
1   BBB    8    3

[[1]]$CCC
  Group Var1 Var2
1   CCC   11    3
..............

The final result should instead be returned like below (or comparable solution)

[[1]]
  Group Var1 Var2
1   AAA    5   10
2   BBB   14    5
3   CCC   10   11

[[2]]
  Group Var1 Var2
1   AAA    4   12
2   BBB    9    3
3   CCC   12    7

[[3]]
  Group Var1 Var2
1   AAA    7    6
2   BBB    8    3
3   CCC   11    3

Which allows me to end up with the following

f1 <- Map(cbind, f, anchor_dates) 
do.call(rbind, f1)

  Group Var1 Var2 Anchor.Date
1   AAA    5   10  2015-07-20
2   BBB   14    5  2015-07-20
3   CCC   10   11  2015-07-20
4   AAA    4   12  2015-07-21
5   BBB    9    3  2015-07-21
6   CCC   12    7  2015-07-21
7   AAA    7    6  2015-07-22
8   BBB    8    3  2015-07-22
9   CCC   11    3  2015-07-22

Upvotes: 3

Views: 161

Answers (2)

bgoldst
bgoldst

Reputation: 35314

`rownames<-`(do.call(rbind,by(groups,groups$Group,function(g)
    do.call(rbind,lapply(anchor_dates,function(anc) {
        befores <- which(g$Test.Date<anc);
        twobefore <- befores[order(anc-g$Test.Date[befores])[1:2]];
        cbind(aggregate(.~Group,g[twobefore,names(g)!='Test.Date'],sum),Anchor.Date=anc);
    }))
)),NULL);
##   Group Var1 Var2 Anchor.Date
## 1   AAA    5   10  2015-07-20
## 2   AAA    4   12  2015-07-21
## 3   AAA    7    6  2015-07-22
## 4   BBB   14    5  2015-07-20
## 5   BBB    9    3  2015-07-21
## 6   BBB    8    3  2015-07-22
## 7   CCC   10   11  2015-07-20
## 8   CCC   12    7  2015-07-21
## 9   CCC   11    3  2015-07-22

Upvotes: 1

TARehman
TARehman

Reputation: 6749

I did this using a function with another function inside it. The outside function is suitable to be called using by(), with subsetted data frames, while the internal one lets us examine multiple anchor dates.

func.get_agg_values <- function(df.groupdata,list_of_anchor_dates) {

    df.returndata <- lapply(X = list_of_anchor_dates,
                            active.group.df = df.groupdata,
                            FUN = function(anchor.date,active.group.df) {

                                # Get order of the data frame in a proper order
                                active.group.df <- active.group.df[order(active.group.df$Test.Date,decreasing = TRUE),]

                                # Next, we subset active.group.df to those rows that are before the anchor date
                                # Since it was ordered, we can just take 1 and 2 as the last two dates before the anchor date
                                active.group.df <- active.group.df[as.numeric(active.group.df$Test.Date - anchor.date) < 0,][1:2,]

                                # Finally, get the sums and return a data frame
                                returned.row.df <- data.frame(Group = unique(active.group.df$Group),
                                                          Var1 = sum(active.group.df$Var1),
                                                          Var2 = sum(active.group.df$Var2),
                                                          Anchor.Date = anchor.date)
                                return(returned.row.df)
                            })
    return(do.call(what = rbind.data.frame,
                   args = df.returndata))
}
f1 <- do.call(what = rbind.data.frame,
              args = by(data = groups,
                        INDICES = groups$Group,
                        FUN = func.get_agg_values,
                        list_of_anchor_dates = anchor_dates))

> f1
      Group Var1 Var2 Anchor.Date
AAA.1   AAA    5   10  2015-07-20
AAA.2   AAA    4   12  2015-07-21
AAA.3   AAA    7    6  2015-07-22
BBB.1   BBB   14    5  2015-07-20
BBB.2   BBB    9    3  2015-07-21
BBB.3   BBB    8    3  2015-07-22
CCC.1   CCC   10   11  2015-07-20
CCC.2   CCC   12    7  2015-07-21
CCC.3   CCC   11    3  2015-07-22

Upvotes: 2

Related Questions