Reputation: 179
I am trying to arrange my panel data set to create new variable using averages of regular occurring lag periods. My example data set look like below and can be accessed using below codes
Time ID Value1 Value2
1 Jan-14 A 12 NA
2 Feb-14 A 14 NA
3 Mar-14 A 15 NA
4 Apr-14 A 18 NA
5 May-14 A 10 NA
6 Jun-14 A 12 13.67
7 Jul-14 A 13 15.67
8 Aug-14 A 14 14.33
9 Jan-14 B 32 NA
10 Feb-14 B 14 NA
11 Mar-14 B 15 NA
12 Apr-14 B 18 NA
13 May-14 B 20 NA
14 Jun-14 B 12 20.33
15 Jul-14 B 13 15.67
16 Aug-14 B 14 17.78
df<-structure(list(Time = structure(c(4L, 3L, 7L, 1L, 8L, 6L, 5L,
2L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L), .Label = c("Apr-14", "Aug-14",
"Feb-14", "Jan-14", "Jul-14", "Jun-14", "Mar-14", "May-14"), class = "factor"),
ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Value1 = c(12L, 14L, 15L, 18L, 10L, 12L, 13L, 14L, 32L, 14L,
15L, 18L, 20L, 12L, 13L, 14L)), .Names = c("Time", "ID",
"Value1"), class = "data.frame", row.names = c(NA, -16L))
I want to create new variable called Value2 considering the average of 3 months corresponding to the previous 4th-6th month for each observation (this is just for the example, my monthly data set goes for 12 years across 40 individuals, and also I will have to consider a higher lag than 3 months, perhaps 14 months). Here I need to consider the ID as well because need to retain the panel structure. For all the observations which do not have complete 3 months previously, should indicate as NA
. For example Jun_14_A
should get the average from Mar_14_A
to Jan_14_A
. (12+14+15)/3 =13.67
I referred some examples here but those do not provide quite what I am looking for specially lag averages and indicating as NA s when there are no previous 3 complete months.
Thank you in advance for any help
Upvotes: 3
Views: 1044
Reputation: 4761
One way to do it, it's not refined so maybe someone (me if i get the time) will later come with a more elegant solution:
First thing is to convert your df$time
into an actual date.
library(zoo)
df$Time=as.yearmon(as.character(df$Time),"%b-%y")
library(dplyr)
df%>%
group_by(ID)%>%
mutate(Value2 = ( lag(Value1, 3) + lag(Value1, 4) + lag(Value1, 5) ) / 3)
Time ID Value1 Value2
<fctr> <fctr> <int> <dbl>
1 Jan-14 A 12 NA
2 Feb-14 A 14 NA
3 Mar-14 A 15 NA
4 Apr-14 A 18 NA
5 May-14 A 10 NA
6 Jun-14 A 12 13.66667
7 Jul-14 A 13 15.66667
8 Aug-14 A 14 14.33333
9 Jan-14 B 32 NA
10 Feb-14 B 14 NA
11 Mar-14 B 15 NA
12 Apr-14 B 18 NA
13 May-14 B 20 NA
14 Jun-14 B 12 20.33333
15 Jul-14 B 13 15.66667
16 Aug-14 B 14 17.66667
I created 3 variable of with lag 3, 4 and 5 and simply calculate the mean, any NA (not 3 complete months) will return NA.
Upvotes: 1
Reputation: 269674
Try rollapplyr
from zoo. The width of list(-3:-5)
means it will use the components which are offset -3, -4 and -5 from the current point at each iteration:
library(zoo)
roll <- function(x) rollapplyr(x, list(-3:-5), mean, fill = NA)
transform(df, Value2 = ave(Value1, ID, FUN = roll))
giving the following data frame:
Time ID Value1 Value2
1 Jan-14 A 12 NA
2 Feb-14 A 14 NA
3 Mar-14 A 15 NA
4 Apr-14 A 18 NA
5 May-14 A 10 NA
6 Jun-14 A 12 13.66667
7 Jul-14 A 13 15.66667
8 Aug-14 A 14 14.33333
9 Jan-14 B 32 NA
10 Feb-14 B 14 NA
11 Mar-14 B 15 NA
12 Apr-14 B 18 NA
13 May-14 B 20 NA
14 Jun-14 B 12 20.33333
15 Jul-14 B 13 15.66667
16 Aug-14 B 14 17.66667
Note: Although we have not made the change since it is not needed to answer the question, note that you may wish to use zoo's "yearmon"
class for the first column. With that class year/month values sort correctly yet display nicely. That is,
transform(df, Time = as.yearmon(Time, "%b-%y"), Value2 = ...as above...)
Upvotes: 2
Reputation: 92292
Here's another way using just data.table
and it's shift
function combined with Reduce
(This is almost a dupe of this)
library(data.table)
setDT(df)[, Value2 := Reduce(`+`, shift(Value1, 3:5))/3, by = ID]
df
# Time ID Value1 Value2
# 1: Jan-14 A 12 NA
# 2: Feb-14 A 14 NA
# 3: Mar-14 A 15 NA
# 4: Apr-14 A 18 NA
# 5: May-14 A 10 NA
# 6: Jun-14 A 12 13.66667
# 7: Jul-14 A 13 15.66667
# 8: Aug-14 A 14 14.33333
# 9: Jan-14 B 32 NA
# 10: Feb-14 B 14 NA
# 11: Mar-14 B 15 NA
# 12: Apr-14 B 18 NA
# 13: May-14 B 20 NA
# 14: Jun-14 B 12 20.33333
# 15: Jul-14 B 13 15.66667
# 16: Aug-14 B 14 17.66667
Upvotes: 2
Reputation: 3888
We could try:
library(data.table)
setDT(df)
f <- function(x) if(anyNA(x[1:6])) NA else mean(x[1:3])
df[, Value2 := zoo::rollapply(Value1, 6, f, align = "right", partial = TRUE), by = ID]
> df
Time ID Value1 Value2
1: Jan-14 A 12 NA
2: Feb-14 A 14 NA
3: Mar-14 A 15 NA
4: Apr-14 A 18 NA
5: May-14 A 10 NA
6: Jun-14 A 12 13.66667
7: Jul-14 A 13 15.66667
8: Aug-14 A 14 14.33333
9: Jan-14 B 32 NA
10: Feb-14 B 14 NA
11: Mar-14 B 15 NA
12: Apr-14 B 18 NA
13: May-14 B 20 NA
14: Jun-14 B 12 20.33333
15: Jul-14 B 13 15.66667
16: Aug-14 B 14 17.66667
Upvotes: 1