Reputation: 495
I'm currently working on translating some commands for time-series data in Stata into R. I'm using the zoo
package to calculate moving averages in R. Here is what my data looks like:
data <- cbind(c(1960:1970), c(95.5, 95.3, 95.3, 95.7, 95.7, 95.7,
95.1, 95.1, 95.1, 95, 95))
[,1] [,2]
[1,] 1960 95.5
[2,] 1961 95.3
[3,] 1962 95.3
[4,] 1963 95.7
[5,] 1964 95.7
[6,] 1965 95.7
[7,] 1966 95.1
[8,] 1967 95.1
[9,] 1968 95.1
[10,] 1969 95.0
[11,] 1970 95.0
I'll make this into a data.frame
:
data <- as.data.frame(data)
Now, I can use the rollmean
function to calculate the moving averages for turnout
with my data:
data$turnout <- rollmean(data[,2], 1, fill = NA)
And this is what I get:
V1 V2 turnout
1 1960 95.5 95.5
2 1961 95.3 95.3
3 1962 95.3 95.3
4 1963 95.7 95.7
5 1964 95.7 95.7
6 1965 95.7 95.7
7 1966 95.1 95.1
8 1967 95.1 95.1
9 1968 95.1 95.1
10 1969 95.0 95.0
11 1970 95.0 95.0
This is all well and good, but my issue is that I want my column turnout
(moving average) to start at 1961 instead of 1960. This code does not exclude the first observation, which is what I am trying to do.
For reference, the equivalent Stata command would be:
tssmooth ma m1turnout = turnout, window (1 0)
I have already tried using the align = "right"
function, but that doesn't seem to do the trick. Any ideas?
Thanks in advance!
Edit--to clarify, I'm doing this across different lengths. In Stata the full code is as such, where since
is a variable that describes the number of years since an intervention.
foreach y of numlist 1(1)10{
tssmooth ma m`y'turnout = turnout, window (`y' 0)
}
gen dvturnout=.
foreach y of numlist 2(1)9{
replace dvturnout = l1.turnout if since==1
replace dvturnout = m`y'turnout if since==`y' & m`y'turnout!=.
replace dvturnout = m10turnout if (since==10 & m10turnout!=.) | (since==. & redist!=. & m10turnout!=.)
}
foreach y of numlist 1(1)10{
drop m`y'turnout
}
My ultimate goal is this dvturnout
variable.
When I try what I presume corresponds to the first section of the code in Stata, that is:
foreach y of numlist 1(1)10{
tssmooth ma m`y'turnout = turnout, window (`y' 0)
}
In R, I do this (where [,35]
is the column I'm starting to add variables to):
for (j in 1:10) {
data_countries[[i]][,35+j] <- rollmean(data_countries[[i]][,13], j, fill = NA, align = "right")
}
}
And it spits out this for me:
year since V36 V37 V38 V39 V40 V41 V42 V43 V44 V45
1 1960 NA 95.5 NA NA NA NA NA NA NA NA NA
2 1961 NA 95.3 95.40 NA NA NA NA NA NA NA NA
3 1962 NA 95.3 95.30 95.36667 NA NA NA NA NA NA NA
4 1963 NA 95.7 95.50 95.43333 95.450 NA NA NA NA NA NA
5 1964 NA 95.7 95.70 95.56667 95.500 95.50 NA NA NA NA NA
6 1965 NA 95.7 95.70 95.70000 95.600 95.54 95.53333 NA NA NA NA
7 1966 NA 95.1 95.40 95.50000 95.550 95.50 95.46667 95.47143 NA NA NA
8 1967 NA 95.1 95.10 95.30000 95.400 95.46 95.43333 95.41428 95.4250 NA NA
9 1968 NA 95.1 95.10 95.10000 95.250 95.34 95.40000 95.38571 95.3750 95.38889 NA
10 1969 NA 95.0 95.05 95.06667 95.075 95.20 95.28333 95.34286 95.3375 95.33333 95.35
11 1970 NA 95.0 95.00 95.03333 95.050 95.06 95.16667 95.24286 95.3000 95.30000 95.30
These numbers are all fine, but they're "shifted" down from where I want them to be. Here is what the same operation gives me in Stata:
year dvturnout m1turnout m2turnout m3turnout m4turnout m5turnout m6turnout m7turnout m8turnout m9turnout m10turnout
1960
1961 95.5 95.5 95.5 95.5 95.5 95.5 95.5 95.5 95.5 95.5
1962 95.3 95.4 95.4 95.4 95.4 95.4 95.4 95.4 95.4 95.4
1963 95.3 95.3 95.36667 95.36667 95.36667 95.36667 95.36667 95.36667 95.36667 95.36667
1964 95.7 95.5 95.43333 95.45 95.45 95.45 95.45 95.45 95.45 95.45
1965 95.7 95.7 95.56667 95.5 95.5 95.5 95.5 95.5 95.5 95.5
1966 95.7 95.7 95.7 95.6 95.54 95.53333 95.53333 95.53333 95.53333 95.53333
1967 95.1 95.39999 95.5 95.55 95.5 95.46667 95.47143 95.47143 95.47143 95.47143
1968 95.1 95.1 95.3 95.39999 95.46 95.43333 95.41428 95.425 95.425 95.425
1969 95.1 95.1 95.1 95.25 95.34 95.39999 95.38571 95.375 95.38889 95.38889
1970 95 95.05 95.06667 95.075 95.2 95.28333 95.34286 95.3375 95.33334 95.35
Upvotes: 3
Views: 3736
Reputation: 4989
What you need is a moving average function that does not include the current observation. Thankfully, w_i_l_l wrote a function exactly like that. What made things complicated: the writer of your paper filled up the moving average that has not enough data (e.g., k = 4, but only 3 data points) with the result of the previous column. I would really not advise to do that as this can (and usually will) lead to major confusion, if not pointed out very explicitly.
# w_i_l_l's moving average function
mav <- function(x,n){filter(x,rep(1/n,n), sides=1)}
mavback <- function(x,n){
a<-mav(x,1)
b<-mav(x,(n+1))
c<-(1/n)*((n+1)*b - a)
return(c)
}
# Create 10 columns with moving averages of k = 1:10
result <- NULL
for(i in 1:10){
result <- cbind(result,mavback(test[,2], i))
}
# Give propers names to columns
colnames(result) <- paste0("m", 1:ncol(result)-1,"turnout")
# Combine result with base data
result <- cbind(test,data.frame(result))
# WONKY STATISTICS: If there is a NA (= not enough data for a
# moving average) fill it up with previous column's result
for(i in 4:ncol(result)){
# Nested loop starts from first row
for(j in 2:nrow(result)){
# Check for NA
if(is.na(result[j,i])){
result[j,i] <- result[j,i-1]
}
}
}
> result
year turnout m0turnout m1turnout m2turnout m3turnout m4turnout m5turnout m6turnout m7turnout m8turnout m9turnout
1 1960 95.5 NA NA NA NA NA NA NA NA NA NA
2 1961 95.3 95.5 95.50 95.50000 95.50000 95.50000 95.50000 95.50000 95.50000 95.50000 95.50000
3 1962 95.3 95.3 95.40 95.40000 95.40000 95.40000 95.40000 95.40000 95.40000 95.40000 95.40000
4 1963 95.7 95.3 95.30 95.36667 95.36667 95.36667 95.36667 95.36667 95.36667 95.36667 95.36667
5 1964 95.7 95.7 95.50 95.43333 95.45000 95.45000 95.45000 95.45000 95.45000 95.45000 95.45000
6 1965 95.7 95.7 95.70 95.56667 95.50000 95.50000 95.50000 95.50000 95.50000 95.50000 95.50000
7 1966 95.1 95.7 95.70 95.70000 95.60000 95.54000 95.53333 95.53333 95.53333 95.53333 95.53333
8 1967 95.1 95.1 95.40 95.50000 95.55000 95.50000 95.46667 95.47143 95.47143 95.47143 95.47143
9 1968 95.1 95.1 95.10 95.30000 95.40000 95.46000 95.43333 95.41429 95.42500 95.42500 95.42500
10 1969 95.0 95.1 95.10 95.10000 95.25000 95.34000 95.40000 95.38571 95.37500 95.38889 95.38889
11 1970 95.0 95.0 95.05 95.06667 95.07500 95.20000 95.28333 95.34286 95.33750 95.33333 95.35000
> result
year turnout m0turnout m1turnout m2turnout m3turnout m4turnout m5turnout m6turnout m7turnout m8turnout m9turnout
1 1960 95.5 NA NA NA NA NA NA NA NA NA NA
2 1961 95.3 95.5 NA NA NA NA NA NA NA NA NA
3 1962 95.3 95.3 95.40 NA NA NA NA NA NA NA NA
4 1963 95.7 95.3 95.30 95.36667 NA NA NA NA NA NA NA
5 1964 95.7 95.7 95.50 95.43333 95.450 NA NA NA NA NA NA
6 1965 95.7 95.7 95.70 95.56667 95.500 95.50 NA NA NA NA NA
7 1966 95.1 95.7 95.70 95.70000 95.600 95.54 95.53333 NA NA NA NA
8 1967 95.1 95.1 95.40 95.50000 95.550 95.50 95.46667 95.47143 NA NA NA
9 1968 95.1 95.1 95.10 95.30000 95.400 95.46 95.43333 95.41429 95.4250 NA NA
10 1969 95.0 95.1 95.10 95.10000 95.250 95.34 95.40000 95.38571 95.3750 95.38889 NA
11 1970 95.0 95.0 95.05 95.06667 95.075 95.20 95.28333 95.34286 95.3375 95.33333 95.35
test <- data.frame(cbind(year = c(1960:1970),
turnout = c(95.5, 95.3, 95.3, 95.7, 95.7,
95.7, 95.1, 95.1, 95.1, 95, 95)))
Upvotes: 4
Reputation: 495
I found the simplest way to work this was with the lag
function.
data$turnout <- lag(rollmean(data[,2], 1, fill = NA),1)
Upvotes: 1
Reputation: 3053
Maybe you are looking for something like this:
library(zoo)
library(forecast)
data <- cbind(c(1960:1970), c(95.5, 95.3, 95.3, 95.7, 95.7, 95.7, 95.1, 95.1, 95.1, 95, 95))
x1 <- ts(data = data[, 2], start = 1960, end = 1970, frequency = 1)
x2 <- cbind(x1, turnout = zoo::rollmeanr(x1, k = 2))
Print the time series object:
x2
Time Series:
Start = 1960
End = 1970
Frequency = 1
x1 turnout
1960 95.5 NA
1961 95.3 95.40
1962 95.3 95.30
1963 95.7 95.50
1964 95.7 95.70
1965 95.7 95.70
1966 95.1 95.40
1967 95.1 95.10
1968 95.1 95.10
1969 95.0 95.05
1970 95.0 95.00
Plot:
forecast::autoplot(x2)
Upvotes: 1