torakxkz
torakxkz

Reputation: 493

Calculate median value over the past X months, for each row and ID in R

I need to create a new column with the median value of the past 6 months (180 days) for each ID, considering the actual value. If there is no information or the previous record is > 6 months, the median value must be the value of that row.

Input Data

I have this:

structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956, 
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601, 
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380, 
16079), class = "Date")), .Names = c("id", "value", "date"), row.names = c(NA, -11L), class = "data.frame")

What I have to achieve is this:

structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956, 
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601, 
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380, 
16079), class = "Date"), median = c(956,986,995,995,990,700,600,797.5,956,975.5, 986)), .Names = c("id", "value", "date", "median"), row.names = c(NA, -11L), class = "data.frame")

I have tried to use rollaplyr and rollmeadian from zoo package to follow the answers provide in this post Finding Cumulative Sum In R Using Conditions

But I can't come to the good result.

Thank you in advanced

Upvotes: 2

Views: 390

Answers (1)

Terru_theTerror
Terru_theTerror

Reputation: 5017

Try this solution:

Split the data.frame by id using function split:

list_df<-split(df,f=df$id)

Function providing the median on a single id value with date condition:

f_median<-function(i,db)
{
  return(median(db[as.POSIXct(db[,"date"])>=as.POSIXct(db[i,"date"]-180) & as.POSIXct(db[,"date"])<=as.POSIXct(db[i,"date"]),"value"]))
}

Iteration over splitted data.frame:

f<-function(db)
{
   return(sapply(rep(1:nrow(db)),f_median,db))
}

Your desired output

 median<-unlist(lapply(list_df,f))
 cbind(df,median)
   id value       date median
1   1   956 2012-09-18  956.0
2   2   986 2016-10-01  986.0
31  3   995 2000-01-09  995.0
32  3   995 2000-04-21  995.0
33  3   986 2000-10-13  990.5
41  4   700 2010-01-01  700.0
42  4   600 2011-10-01  600.0
43  4   995 2012-01-03  797.5
44  4   956 2012-01-05  956.0
45  4  1000 2012-02-10  975.5
46  4   986 2014-01-09  986.0

Upvotes: 1

Related Questions