fjurt
fjurt

Reputation: 793

Interpolate NA values with the mean for each row but only for one or two NA values between numerical values

I try to interpolate NA values for each row but I only want to interpolate the NA values if I have two or fewer NA values next to each other. So for example in row 3, there are three NAs next to each other, so I don't want to interpolate but in rows one and two there are two or fewer next to each other, so I aim to interpolate them linearly. Is there an efficient way to deal with it?

I have a dataset that looks like that:

df1:
   ID string1 2018 2019 2020 2021 2022 string2
1: a1      x2    3    3   NA    4    4      si
2: a2      g3    5    5   NA   NA    1      q2
3: a3      n2   11   NA   NA   NA    3      oq
4: a4      m3    3   NA    9    8    8      mx
5: a5      2w    9    1   NA    5   NA      ix
6: a6     ps2    2   NA    7    4    4      p2
7: a7     kg2    6   NA   NA   NA    6      2q

For reproducibility:

df1 = data.table(
  ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7"),
  "string1" = c("x2", "g3", "n2", "m3", "2w", "ps2", "kg2"),
  "2018" = c(3,5,11,3,9,2,6),
  "2019" = c(3,5,NA,NA,1,NA,NA),
  "2020" = c(NA,NA,NA,9,NA,7,NA),
  "2021" = c(4,NA,NA,8,5,4,NA),
  "2022" = c(4,1,3,8,NA,4,6),
  "string2" = c("si", "q2", "oq", "mx", "ix", "p2", "2q"))

I try to get a data.table which looks like that:

   ID string1 2018 2019 2020 2021 2022 string2
1: a1      x2    3 3.00  3.5    4    4      si
2: a2      g3    5 5.00  4.3    3    1      q2
3: a3      n2   11   NA   NA   NA    3      oq
4: a4      m3    3 8.25  9.0    8    8      mx
5: a5      2w    9 1.00 -0.3    5   17      ix
6: a6     ps2    2 8.00  7.0    4    4      p2
7: a7     kg2    6   NA  NA    NA    6      2q

Thanks for any suggestions!

Upvotes: 0

Views: 76

Answers (1)

lovalery
lovalery

Reputation: 4652

Please find a solution (cf. reprex below) using data.table and imputeTS libraries.

Reprex

  • Code
library(data.table)
library(imputeTS)

results <- df1 %>% 
  transpose(., keep.names = 'rn') %>% 
  {.[3:nrow(df1), lapply(.SD, as.numeric),
  ][, lapply(.SD, na_interpolation, "spline", 2)]} %>% 
  round(., 2) %>%  
  transpose(., make.names = 'rn') %>% 
  cbind(.,df1[,c("ID", "string1", "string2")]) %>% 
  setcolorder(., names(df1))
  • Output
results
#>        ID string1  2018  2019  2020  2021  2022 string2
#>    <char>  <char> <num> <num> <num> <num> <num>  <char>
#> 1:     a1      x2     3  3.00  3.50     4     4      si
#> 2:     a2      g3     5  5.00  4.33     3     1      q2
#> 3:     a3      n2    11    NA    NA    NA     3      oq
#> 4:     a4      m3     3  8.25  9.00     8     8      mx
#> 5:     a5      2w     9  1.00 -0.50     5     5      ix
#> 6:     a6     ps2     2  8.00  7.00     4     4      p2
#> 7:     a7     kg2     6    NA    NA    NA     6      2q

Created on 2021-12-02 by the reprex package (v2.0.1)


Probably a better solution (cf. reprex below) using data.table and zoo libraries. This solution gives exactly the result you want (i.e. forget my comment under your question!)

Reprex

  • Code
library(data.table)
library(zoo)
library(magrittr) # for the pipes! 

results <- df1 %>% 
  transpose(., keep.names = 'rn') %>% 
  {.[3:nrow(df1), lapply(.SD, as.numeric),
  ][, lapply(.SD, na.spline, maxgap = 2)]} %>% 
  round(., 2) %>%  
  transpose(., make.names = 'rn') %>% 
  cbind(.,df1[,c("ID", "string1", "string2")]) %>% 
  setcolorder(., names(df1))
  • Output
results
#>        ID string1  2018  2019  2020  2021  2022 string2
#>    <char>  <char> <num> <num> <num> <num> <num>  <char>
#> 1:     a1      x2     3  3.00  3.50     4     4      si
#> 2:     a2      g3     5  5.00  4.33     3     1      q2
#> 3:     a3      n2    11    NA    NA    NA     3      oq
#> 4:     a4      m3     3  8.25  9.00     8     8      mx
#> 5:     a5      2w     9  1.00 -0.33     5    17      ix
#> 6:     a6     ps2     2  8.00  7.00     4     4      p2
#> 7:     a7     kg2     6    NA    NA    NA     6      2q

Created on 2021-12-03 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions