Reputation: 172
I'd like to use dplyr and tbrf to calculate a 90 day rolling geometric mean and 90th percentile for each group 'Type'. The code below is generating percentiles for each date, not every 90 days. It also is wrongly generating duplicate rows.
side note:I first tried using %within% and creating an interval in the lubridate package.However, class Interval from lubridate is currently not supported in dplyr hence wanting to try tbrf. I have also tried tibbletime,RcppRoll and zoo's Rollapply
##sample data###
Value=c(50,900,25,25,125,50,25,25,2000,25,25,
25,25,25,25,25,25,325,25,300,475,25)
Dates = as.Date(c("2015-02-23","2015-04-20","2015-06-17",
"2015-08-20","2015-10-05","2015-12-22",
"2016-01-19","2016-03-29","2016-05-03",
"2016-07-21","2016-09-08","2016-11-07",
"2017-02-27","2017-04-19","2017-06-29",
"2017-08-24","2017-10-23","2017-12-28",
"2018-01-16","2018-03-14","2018-05-29",
"2018-07-24"))
Type = c(rep("A", 11), rep("B", 11))
df=data.frame(Value,Dates,Type)
######failed attempt 1####
df2=df %>% group_by(Type) %>%
tbr_gmean(Value, Dates, "days", 90) %>%
tbr_misc(Value, Dates,"days", 90, quantile, .(0.9))
## failed attempt #2##
start.date = min(df$Dates)
breaks = seq(start.date - 30*3600*24, start.date + 30*3600*24, "90 days")
df$group = cut(df$Dates, breaks=breaks)
DF= df %>% group_by(Type,group) %>%
mutate(Count=n(),gm=geoMean(Value),
percentile_90=quantile(Value,0.90))
Upvotes: 0
Views: 517
Reputation: 682
Edited: Try this:
library(psych)
library(dplyr)
library(zoo)
dfmod<-df %>%
group_by(Type) %>%
arrange(Dates) %>%
mutate(rnk = cumsum(c(TRUE, diff(Dates) > 5)))%>% #changed it from !=1 to reflect that you want the date difference to be within 5 days or less
group_by(Type,rnk) %>%
mutate(GM = rollapply(Value, 2, geometric.mean, fill=NA, align="right"),
qt=rollapply(Value, 2, quantile, p=0.90, fill=NA, align="right")) #changed 5 to 2 so that the rolling sum is calculated for every 2 rows
head(dfmod)
## A tibble: 6 x 6
## Groups: Type, rnk [1]
# Dates Type Value rnk GM qt
# <date> <fct> <dbl> <int> <dbl> <dbl>
#1 2018-10-03 A 35.3 1 NA NA
#2 2018-10-04 A 34.3 1 NA NA
#3 2018-10-05 A 34.6 1 NA NA
#4 2018-10-06 A 34.3 1 NA NA
#5 2018-10-07 A 34.1 1 34.5 35.1
#6 2018-10-08 A 34.7 1 34.4 34.6
Upvotes: 2