Reputation: 623
I am removing the erroneous data values from my data set. Till now I was doing it by following method:
Suppose z[,1] is my time series variable.ei s are the respective elements in it.
std_d is sd(abs( diff(z[,1], lag=1) ))
e1-e2> std_d... remove e2.
e1-e3> std_d...remove e3
e1-e4<std_d...move on to e4
e4-e5 <std_d..move on e5
e5-e6>std_d...remove e6
e5-e7<std_d...move on e7
I am doing this using the following code:
zx <- as.numeric(coredata(z[,1]))
coredata(z[,1]) <- Reduce(function(y,xx){
if( abs(tail(y[!is.na(y)], 1) - xx) > std_d ) {
c(y,NA)} else {
c(y,xx)} },
zx )
My question is:
I want to switch from std_d i.e. standard deviation of lag difference to 'moving standard deviation'. For example if we are checking e20 , std_d should be--> std deviation of difference of 15 elements before it and 15 elements after it, with lag=1.
I was thinking of using roll mean in zoo. But I failed to fit it in the above function. How can it be done?
Thank you for your time and consideration. Here is the sample data:
"timestamp" "mesured_distance" "IFC_Code" "from_sensor_to_river_bottom"
"1" "2012-06-04 21:30:09-05" 4818 995 5030
"2" "2012-06-04 21:15:11-05" 4820 995 5030
"3" "2012-06-04 21:00:10-05" 4818 995 5030
"4" "2012-06-04 20:45:10-05" 4817 995 5030
"5" "2012-06-04 20:30:09-05" 8816 995 5030
"6" "2012-06-04 20:15:09-05" 4816 995 5030
"7" "2012-06-04 20:00:08-05" 4811 995 5030
"8" "2012-06-04 19:45:07-05" 15009 995 5030
"9" "2012-06-04 19:30:07-05" 4810 995 5030
"10" "2012-06-04 19:15:09-05" 4795 995 5030
Upvotes: 0
Views: 225
Reputation: 263451
Perhaps... untested in absence of data:
zx <- as.numeric(coredata(z[,1]))
coredata(z[,1]) <- Reduce(function(y,xx){
if( length(y) <15) {c(y,xx) } else {
if( abs(tail(y[!is.na(y)], 1) - xx) > std(tail( y, 15) ) {
c(y,NA)} else {
c(y,xx)} }
},
zx )
Can't be sure I got the parens and braces matched properly without testing
Upvotes: 1