Rachel Pals
Rachel Pals

Reputation: 1

In R: how to conditionally use rollapply based on the number of NA values in the window?

Basic Problem:

I'm new to R (and to programming in general) so I apologize if this post is not well formatted. I am currently using R to do some analysis on weather data. In short, I need to take the moving average of minimum temperatures for each city, but I need it to perform the calculation even if there are up to five NAs in the width = 31 rolling window I have specified.

To do the basic rolling mean calculation, I've been using rollapply from the zoo package:

library("dplyr")
library("zoo")

Also, if anyone wants to show me how to generate some random sample data for future questions, that would be super helpful. My data is a dataframe with three columns: Year (integer), City (character -in this case all are "KASLO", and MinTemp (Numeric, with some NA values). The name of the dataframe is the same as the "City" column ("KASLO").

Basic code I've been using to get the moving average without conditions on number of NA values:

MA <- rollapply(KASLO$MinTemp, width = 31, mean, fill = NA)
KASLO <- mutate(KASLO, "Moving Average" = MA)

This is a good start, but due to the nature of the data there are gaps throughout the years. I need the program to provide me with an output even if there are up to 5 NA values within the rolling window. So, for example, if there were 5 NAs in the 31 width window for a year, the code would compute the moving average using the 26 existing values. Currently, the output gives NA unless the window has zero NA values.

I have tried to do the following (and other variations), to no avail:

MA <- rollapply(KASLO$MinTemp, width = 31, function (x) if(length(which(!is.na(x))) >= 26) { mean(x) }, fill = NA)
KASLO <- mutate(KASLO, "Moving Average" = MA)

This provides the same output as if I had not added in the function/if statement (ie only calculates MA if there are no NAs present).

Any help on this task is much appreciated!

Upvotes: 0

Views: 931

Answers (1)

denis
denis

Reputation: 5673

Here is an example dataset:

set.seed(246)
test <- sample(c(0,1),100,replace = T)
test[sample(1:100,50)] <- NA  
test  

  [1] NA  0 NA NA NA  1  1  0  1  1  1 NA NA  0  1  1  0 NA  1  0 NA  1  1  1 NA  1  1  1 NA  0 NA  0 NA  0 NA NA  0  0
 [39] NA  1  0  0 NA NA NA NA NA  1  1 NA NA  1 NA NA  1  1  0 NA  0  1 NA  1 NA NA NA  0 NA NA NA NA  0  0  0  1  0 NA
 [77] NA NA NA NA NA NA  1 NA NA NA NA NA  1  1  0 NA NA  1  0  0 NA NA  0 NA

And here a solution, for a windows of 10, calculating the mean if there are less than 5 NAs in the windows, and giving NA otherwise:

library(zoo)

rollapply(test, width = 10, function(x){
  if(sum(is.na(x))>4){
    NA
    }else{mean(x,na.rm = T)}
  }, fill = NA)


  [1]        NA        NA        NA        NA 0.6666667 0.7142857 0.8333333 0.8333333 0.7142857 0.7500000 0.7500000
 [12] 0.6250000 0.7142857 0.7142857 0.5714286 0.5000000 0.5714286 0.6250000 0.7500000 0.7142857 0.7142857 0.8571429
 [23] 0.8750000 0.8571429 0.8571429 0.8571429 0.7142857 0.6666667 0.5000000 0.5000000        NA        NA        NA
 [34]        NA        NA 0.1666667 0.1666667 0.1666667        NA        NA        NA        NA        NA        NA
 [45]        NA        NA        NA        NA        NA        NA        NA 0.8333333        NA        NA 0.6666667
 [56] 0.6666667 0.6666667 0.6666667 0.6666667        NA        NA        NA        NA        NA        NA        NA
 [67]        NA        NA        NA 0.1666667        NA        NA        NA        NA        NA        NA        NA
 [78]        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA
 [89]        NA        NA 0.5000000 0.5000000 0.5000000 0.3333333        NA        NA        NA        NA        NA
[100]        NA

Upvotes: 1

Related Questions