Reputation: 11
I am working with air quality data and I need to detect zero sequences (2+ zeros in sequence) and replace each element of this sequence with NA. The solo zero values must remain in the data.
Here's and example of the data:
date TEMP PM10 O3 (ug/m3)
5/25/2012 18:00:00 23,8 55 6,30397494404564
5/25/2012 19:00:00 22,8 75 0
5/25/2012 20:00:00 19,8 75 1,99689085129112
5/25/2012 21:00:00 15,3 98 11,1542397707455
5/25/2012 22:00:00 16,2 64 2,02173552751248
5/25/2012 23:00:00 16,3 44 0
5/25/2012 0:00:00 17,1 65 0
5/26/2012 1:00:00 17,5 73 0
5/26/2012 2:00:00 17,2 62 0
5/26/2012 3:00:00 17,1 45 0
5/26/2012 4:00:00 17 37 0
5/26/2012 5:00:00 17,3 29 0
5/26/2012 6:00:00 17,2 50 0
5/26/2012 7:00:00 17,1 36 0
5/26/2012 8:00:00 17,1 43 0
5/26/2012 9:00:00 17,9 45 0
5/26/2012 10:00:00 19,5 72 0
5/26/2012 11:00:00 21,3 85 3,57609276547571
5/26/2012 12:00:00 22,3 81 12,8699598468684
So, here I am with a solution:
df$oz<-ifelse(df$`O3 (ug/m3)`==0 & lag(df$`O3 (ug/m3)`)==0,NA,df$`O3 (ug/m3)`)
date TEMP PM10 O3 (ug/m3) oz
5/25/2012 18:00:00 23,8 55 6,30397494404564 6,30397494404564
5/25/2012 19:00:00 22,8 75 0 0
5/25/2012 20:00:00 19,8 75 1,99689085129112 1,99689085129112
5/25/2012 21:00:00 15,3 98 11,1542397707455 11,1542397707455
5/25/2012 22:00:00 16,2 64 2,02173552751248 2,02173552751248
5/25/2012 23:00:00 16,3 44 0 NA
5/25/2012 0:00:00 17,1 65 0 NA
5/26/2012 1:00:00 17,5 73 0 NA
5/26/2012 2:00:00 17,2 62 0 NA
5/26/2012 3:00:00 17,1 45 0 NA
5/26/2012 4:00:00 17 37 0 NA
5/26/2012 5:00:00 17,3 29 0 NA
5/26/2012 6:00:00 17,2 50 0 NA
5/26/2012 7:00:00 17,1 36 0 NA
5/26/2012 8:00:00 17,1 43 0 NA
5/26/2012 9:00:00 17,9 45 0 NA
5/26/2012 10:00:00 19,5 72 0 NA
5/26/2012 11:00:00 21,3 85 3,57609276547571 3,57609276547571
5/26/2012 12:00:00 22,3 81 12,8699598468684 12,8699598468684
Upvotes: 1
Views: 51
Reputation: 94267
This function works by computing the run lengths of consecutive zeroes, then replacing any runs longer than 1 with NA in the original vector:
NArun0 = function(v){
vr=rle(v==0)
vr$values = vr$length>1 & vr$values
rv=inverse.rle(vr)
v[rv]=NA
v
}
And a bunch of simple tests:
> NArun0(c(0,0,0,0))
[1] NA NA NA NA
> NArun0(c(0,1,1,0))
[1] 0 1 1 0
> NArun0(c(0,1,0,2,0))
[1] 0 1 0 2 0
> NArun0(c(0,1,0,0,2,0))
[1] 0 1 NA NA 2 0
> NArun0(c(0,1,0,0,2,0,0))
[1] 0 1 NA NA 2 NA NA
> NArun0(c(0,0,1,0,0,2,0,0))
[1] NA NA 1 NA NA 2 NA NA
Note that the answer given in the question returns NA for all zeroes for me:
> v = c(1,0,2,3,4,0,0,0,0,9)
> ifelse(v==0 & lag(v)==0, NA, v)
[1] 1 NA 2 3 4 NA NA NA NA 9
I thought maybe it was because lag
was operating on a time series vector, but if I convert v
to a time series vector:
> v = ts(v)
> ifelse(v==0 & lag(v)==0, NA, v)
Time Series:
Start = 1
End = 9
Frequency = 1
[1] 1 0 2 3 4 NA NA NA 0
I do get a zero, but also a zero in last place, because of the way lag
works. So I dont understand how the code in the question gives the answer.
Upvotes: 1