TiberiusGracchus2020
TiberiusGracchus2020

Reputation: 409

Skip na_interpolation on dplyr group/variable pairs with full NAs in R

I have a data frame that looks like this:

   Country Year acnt_class     wages
3      AZE 2010         NA        NA
4      AZE 2011  0.4206776        NA
5      AZE 2012         NA        NA
6      AZE 2013         NA        NA
7      AZE 2014  0.7735889 0.4273174
8      AZE 2015         NA        NA
9      AZE 2016         NA        NA
10     AZE 2017  0.5108674 0.4335978
11     AZE 2018         NA        NA
15     BDI 2010         NA        NA
16     BDI 2011  0.3140646        NA
17     BDI 2012         NA        NA
18     BDI 2013         NA        NA
19     BDI 2014  0.1224175        NA
20     BDI 2015         NA        NA
21     BDI 2016         NA        NA
22     BDI 2017         NA        NA
23     BDI 2018         NA        NA
27     BEL 2010         NA        NA
28     BEL 2011  0.9576057        NA
29     BEL 2012         NA        NA
30     BEL 2013         NA        NA
31     BEL 2014  1.0083120 0.9623492
32     BEL 2015         NA        NA
33     BEL 2016         NA        NA
34     BEL 2017  1.0036910 0.9499486
35     BEL 2018         NA        NA

I'm trying to run this function to use stine interpolation to fill in missing NAs by group across both variable columns "acnt_class" and "wages":

DF <- DF %>% 
  group_by(Country) %>% 
  mutate_at(.vars = c("acnt_class", "wages"), 
            .funs = ~na_interpolation(., option = "stine")) 

It works whenever I run it on columns where there are at least two observations for each group, however, here, I run into this error:

Error in na_interpolation(., option = "stine") : 
  Input data needs at least 2 non-NA data point for applying na_interpolation

Due to the group "BDI" having full NAs for the variable "wages".

Ideally, I'm looking for a modified function that will "skip" group/variable pairs with full NAs/1 observation and leave them as they were. Solutions? Thanks!

Upvotes: 2

Views: 348

Answers (2)

Omar Wasow
Omar Wasow

Reputation: 2020

The answer provided by TiberiusGracchus2020 works well. In case it is helpful to anyone, I have turned that code snippet into a function with a lot of comments to make it clearer what's happening at each stage.

# Modify imputeTS::na_interpolate function
#   (1) doesn't break on all NA vectors
#   (2) won't impute leading and lagging NAs

na_interpolation2 <- function(x, option = "linear") {
  library(TSimpute)
  library(dplyr)

  total_not_missing <- sum(!is.na(x))
  
  # check there is sufficient data for na_interpolation 
  if(total_not_missing < 2) {x} 

    else

    # replace takes an input vector, a T/F vector & replacement value
    {replace(
        # input vector is interpolated data
        # this will impute leading/lagging NAs which we don't want 
        imputeTS::na_interpolation(x, option = option), 

        # create T/F vector for NAs,  
        is.na(na.approx(x, na.rm = FALSE)), 

        # replace TRUE with NA in input vector  
        NA) 
      }
}

# example data
data1 <- c(NA, NA, NA, NA, NA) 
data2 <- c(NA, NA, 1, NA, 3, NA)

na_interpolation(data1)
# Error in na_interpolation(data1) : Input data needs at 
# least 2 non-NA data point for applying na_interpolation

na_interpolation(data2)
# [1] 1 1 1 2 3 3

na_interpolation2(data1)
# [1] NA NA NA NA NA

na_interpolation2(data2)
# [1] NA NA  1  2  3 NA

Upvotes: 2

TiberiusGracchus2020
TiberiusGracchus2020

Reputation: 409

Found a solution:

for only interpolation:

library(TSimpute)
library(dplyr)
library(zoo)

DF <- DF %>% 
  group_by(Country) %>% 
  mutate_at(vars(acnt_class, wages), funs(if(sum(!is.na(.))<2) {.} else{replace(na_interpolation(., option = "stine"), is.na(na.approx(., na.rm=FALSE)), NA)}))

Upvotes: 2

Related Questions