Reputation: 179

How to get average of lag period in a frequency as a new variable in r

I am trying to arrange my panel data set to create new variable using averages of regular occurring lag periods. My example data set look like below and can be accessed using below codes

     Time ID  Value1 Value2
1  Jan-14  A     12     NA
2  Feb-14  A     14     NA
3  Mar-14  A     15     NA
4  Apr-14  A     18     NA
5  May-14  A     10     NA
6  Jun-14  A     12  13.67
7  Jul-14  A     13  15.67
8  Aug-14  A     14  14.33
9  Jan-14  B     32     NA
10 Feb-14  B     14     NA
11 Mar-14  B     15     NA
12 Apr-14  B     18     NA
13 May-14  B     20     NA
14 Jun-14  B     12  20.33
15 Jul-14  B     13  15.67
16 Aug-14  B     14  17.78

df<-structure(list(Time = structure(c(4L, 3L, 7L, 1L, 8L, 6L, 5L, 
2L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L), .Label = c("Apr-14", "Aug-14", 
"Feb-14", "Jan-14", "Jul-14", "Jun-14", "Mar-14", "May-14"), class = "factor"), 
    ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
    Value1 = c(12L, 14L, 15L, 18L, 10L, 12L, 13L, 14L, 32L, 14L, 
    15L, 18L, 20L, 12L, 13L, 14L)), .Names = c("Time", "ID", 
"Value1"), class = "data.frame", row.names = c(NA, -16L))

I want to create new variable called Value2 considering the average of 3 months corresponding to the previous 4th-6th month for each observation (this is just for the example, my monthly data set goes for 12 years across 40 individuals, and also I will have to consider a higher lag than 3 months, perhaps 14 months). Here I need to consider the ID as well because need to retain the panel structure. For all the observations which do not have complete 3 months previously, should indicate as NA. For example Jun_14_A should get the average from Mar_14_A to Jan_14_A. (12+14+15)/3 =13.67

I referred some examples here but those do not provide quite what I am looking for specially lag averages and indicating as NA s when there are no previous 3 complete months.

Thank you in advance for any help

Upvotes: 3

Answers (4)

Haboryme

Reputation: 4761

One way to do it, it's not refined so maybe someone (me if i get the time) will later come with a more elegant solution:
First thing is to convert your df$time into an actual date.

library(zoo)
df$Time=as.yearmon(as.character(df$Time),"%b-%y")
library(dplyr)
df%>%
  group_by(ID)%>%
  mutate(Value2 = ( lag(Value1, 3) + lag(Value1, 4) + lag(Value1, 5) ) / 3)

     Time     ID Value1   Value2
   <fctr> <fctr>  <int>    <dbl>
1  Jan-14      A     12       NA
2  Feb-14      A     14       NA
3  Mar-14      A     15       NA
4  Apr-14      A     18       NA
5  May-14      A     10       NA
6  Jun-14      A     12 13.66667
7  Jul-14      A     13 15.66667
8  Aug-14      A     14 14.33333
9  Jan-14      B     32       NA
10 Feb-14      B     14       NA
11 Mar-14      B     15       NA
12 Apr-14      B     18       NA
13 May-14      B     20       NA
14 Jun-14      B     12 20.33333
15 Jul-14      B     13 15.66667
16 Aug-14      B     14 17.66667

I created 3 variable of with lag 3, 4 and 5 and simply calculate the mean, any NA (not 3 complete months) will return NA.

Upvotes: 1

G. Grothendieck

Reputation: 269674

Try rollapplyr from zoo. The width of list(-3:-5) means it will use the components which are offset -3, -4 and -5 from the current point at each iteration:

library(zoo)
roll <- function(x) rollapplyr(x, list(-3:-5), mean, fill = NA)
transform(df, Value2 = ave(Value1, ID, FUN = roll))

giving the following data frame:

     Time ID Value1   Value2
1  Jan-14  A     12       NA
2  Feb-14  A     14       NA
3  Mar-14  A     15       NA
4  Apr-14  A     18       NA
5  May-14  A     10       NA
6  Jun-14  A     12 13.66667
7  Jul-14  A     13 15.66667
8  Aug-14  A     14 14.33333
9  Jan-14  B     32       NA
10 Feb-14  B     14       NA
11 Mar-14  B     15       NA
12 Apr-14  B     18       NA
13 May-14  B     20       NA
14 Jun-14  B     12 20.33333
15 Jul-14  B     13 15.66667
16 Aug-14  B     14 17.66667

Note: Although we have not made the change since it is not needed to answer the question, note that you may wish to use zoo's "yearmon" class for the first column. With that class year/month values sort correctly yet display nicely. That is,

transform(df, Time = as.yearmon(Time, "%b-%y"), Value2 = ...as above...)

Upvotes: 2

David Arenburg

Reputation: 92292

Here's another way using just data.table and it's shift function combined with Reduce (This is almost a dupe of this)

library(data.table)
setDT(df)[, Value2 := Reduce(`+`, shift(Value1, 3:5))/3, by = ID]
df
#       Time ID Value1   Value2
#  1: Jan-14  A     12       NA
#  2: Feb-14  A     14       NA
#  3: Mar-14  A     15       NA
#  4: Apr-14  A     18       NA
#  5: May-14  A     10       NA
#  6: Jun-14  A     12 13.66667
#  7: Jul-14  A     13 15.66667
#  8: Aug-14  A     14 14.33333
#  9: Jan-14  B     32       NA
# 10: Feb-14  B     14       NA
# 11: Mar-14  B     15       NA
# 12: Apr-14  B     18       NA
# 13: May-14  B     20       NA
# 14: Jun-14  B     12 20.33333
# 15: Jul-14  B     13 15.66667
# 16: Aug-14  B     14 17.66667

Upvotes: 2

J.R.

Reputation: 3888

We could try:

library(data.table)
setDT(df)
f <- function(x) if(anyNA(x[1:6])) NA else mean(x[1:3]) 
df[, Value2 := zoo::rollapply(Value1, 6, f, align = "right", partial = TRUE), by = ID]

> df
      Time ID Value1   Value2
 1: Jan-14  A     12       NA
 2: Feb-14  A     14       NA
 3: Mar-14  A     15       NA
 4: Apr-14  A     18       NA
 5: May-14  A     10       NA
 6: Jun-14  A     12 13.66667
 7: Jul-14  A     13 15.66667
 8: Aug-14  A     14 14.33333
 9: Jan-14  B     32       NA
10: Feb-14  B     14       NA
11: Mar-14  B     15       NA
12: Apr-14  B     18       NA
13: May-14  B     20       NA
14: Jun-14  B     12 20.33333
15: Jul-14  B     13 15.66667
16: Aug-14  B     14 17.66667

Upvotes: 1

How to get average of lag period in a frequency as a new variable in r

Answers (4)

Related Questions