W148SMH
W148SMH

Reputation: 172

rolling geometric means using dplyr and tbrf

I'd like to use dplyr and tbrf to calculate a 90 day rolling geometric mean and 90th percentile for each group 'Type'. The code below is generating percentiles for each date, not every 90 days. It also is wrongly generating duplicate rows.

side note:I first tried using %within% and creating an interval in the lubridate package.However, class Interval from lubridate is currently not supported in dplyr hence wanting to try tbrf. I have also tried tibbletime,RcppRoll and zoo's Rollapply

##sample data###

Value=c(50,900,25,25,125,50,25,25,2000,25,25,
25,25,25,25,25,25,325,25,300,475,25)
Dates = as.Date(c("2015-02-23","2015-04-20","2015-06-17",
"2015-08-20","2015-10-05","2015-12-22",
"2016-01-19","2016-03-29","2016-05-03",
"2016-07-21","2016-09-08","2016-11-07",
"2017-02-27","2017-04-19","2017-06-29",
"2017-08-24","2017-10-23","2017-12-28",
"2018-01-16","2018-03-14","2018-05-29",
"2018-07-24"))
Type = c(rep("A", 11), rep("B", 11))

df=data.frame(Value,Dates,Type)



######failed attempt 1####
df2=df %>% group_by(Type) %>% 
 tbr_gmean(Value, Dates, "days", 90) %>%
  tbr_misc(Value, Dates,"days",  90, quantile, .(0.9)) 

 ## failed attempt #2##
 start.date = min(df$Dates)
 breaks = seq(start.date - 30*3600*24, start.date + 30*3600*24, "90 days")
 df$group = cut(df$Dates, breaks=breaks)
 DF= df %>% group_by(Type,group) %>% 
 mutate(Count=n(),gm=geoMean(Value),
 percentile_90=quantile(Value,0.90))

Upvotes: 0

Views: 517

Answers (1)

Shirin Yavari
Shirin Yavari

Reputation: 682

Edited: Try this:

library(psych)
library(dplyr)
library(zoo)
dfmod<-df %>% 
    group_by(Type) %>% 
      arrange(Dates) %>%
        mutate(rnk = cumsum(c(TRUE, diff(Dates) > 5)))%>% #changed it from !=1 to reflect that you want the date difference to be within 5 days or less
          group_by(Type,rnk) %>% 
            mutate(GM = rollapply(Value, 2, geometric.mean, fill=NA, align="right"),
                    qt=rollapply(Value, 2, quantile, p=0.90, fill=NA, align="right")) #changed 5 to 2 so that the rolling sum is calculated for every 2 rows 
 head(dfmod)
## A tibble: 6 x 6
## Groups:   Type, rnk [1]
#  Dates      Type  Value   rnk    GM    qt
#  <date>     <fct> <dbl> <int> <dbl> <dbl>
#1 2018-10-03 A      35.3     1  NA    NA  
#2 2018-10-04 A      34.3     1  NA    NA  
#3 2018-10-05 A      34.6     1  NA    NA  
#4 2018-10-06 A      34.3     1  NA    NA  
#5 2018-10-07 A      34.1     1  34.5  35.1
#6 2018-10-08 A      34.7     1  34.4  34.6

Upvotes: 2

Related Questions