Reputation: 793
I try to interpolate NA values for each row but I only want to interpolate the NA values if I have two or fewer NA values next to each other. So for example in row 3, there are three NAs next to each other, so I don't want to interpolate but in rows one and two there are two or fewer next to each other, so I aim to interpolate them linearly. Is there an efficient way to deal with it?
I have a dataset that looks like that:
df1:
ID string1 2018 2019 2020 2021 2022 string2
1: a1 x2 3 3 NA 4 4 si
2: a2 g3 5 5 NA NA 1 q2
3: a3 n2 11 NA NA NA 3 oq
4: a4 m3 3 NA 9 8 8 mx
5: a5 2w 9 1 NA 5 NA ix
6: a6 ps2 2 NA 7 4 4 p2
7: a7 kg2 6 NA NA NA 6 2q
For reproducibility:
df1 = data.table(
ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7"),
"string1" = c("x2", "g3", "n2", "m3", "2w", "ps2", "kg2"),
"2018" = c(3,5,11,3,9,2,6),
"2019" = c(3,5,NA,NA,1,NA,NA),
"2020" = c(NA,NA,NA,9,NA,7,NA),
"2021" = c(4,NA,NA,8,5,4,NA),
"2022" = c(4,1,3,8,NA,4,6),
"string2" = c("si", "q2", "oq", "mx", "ix", "p2", "2q"))
I try to get a data.table
which looks like that:
ID string1 2018 2019 2020 2021 2022 string2
1: a1 x2 3 3.00 3.5 4 4 si
2: a2 g3 5 5.00 4.3 3 1 q2
3: a3 n2 11 NA NA NA 3 oq
4: a4 m3 3 8.25 9.0 8 8 mx
5: a5 2w 9 1.00 -0.3 5 17 ix
6: a6 ps2 2 8.00 7.0 4 4 p2
7: a7 kg2 6 NA NA NA 6 2q
Thanks for any suggestions!
Upvotes: 0
Views: 76
Reputation: 4652
Please find a solution (cf. reprex below) using data.table
and imputeTS
libraries.
Reprex
library(data.table)
library(imputeTS)
results <- df1 %>%
transpose(., keep.names = 'rn') %>%
{.[3:nrow(df1), lapply(.SD, as.numeric),
][, lapply(.SD, na_interpolation, "spline", 2)]} %>%
round(., 2) %>%
transpose(., make.names = 'rn') %>%
cbind(.,df1[,c("ID", "string1", "string2")]) %>%
setcolorder(., names(df1))
results
#> ID string1 2018 2019 2020 2021 2022 string2
#> <char> <char> <num> <num> <num> <num> <num> <char>
#> 1: a1 x2 3 3.00 3.50 4 4 si
#> 2: a2 g3 5 5.00 4.33 3 1 q2
#> 3: a3 n2 11 NA NA NA 3 oq
#> 4: a4 m3 3 8.25 9.00 8 8 mx
#> 5: a5 2w 9 1.00 -0.50 5 5 ix
#> 6: a6 ps2 2 8.00 7.00 4 4 p2
#> 7: a7 kg2 6 NA NA NA 6 2q
Created on 2021-12-02 by the reprex package (v2.0.1)
Probably a better solution (cf. reprex below) using data.table
and zoo
libraries. This solution gives exactly the result you want (i.e. forget my comment under your question!)
Reprex
library(data.table)
library(zoo)
library(magrittr) # for the pipes!
results <- df1 %>%
transpose(., keep.names = 'rn') %>%
{.[3:nrow(df1), lapply(.SD, as.numeric),
][, lapply(.SD, na.spline, maxgap = 2)]} %>%
round(., 2) %>%
transpose(., make.names = 'rn') %>%
cbind(.,df1[,c("ID", "string1", "string2")]) %>%
setcolorder(., names(df1))
results
#> ID string1 2018 2019 2020 2021 2022 string2
#> <char> <char> <num> <num> <num> <num> <num> <char>
#> 1: a1 x2 3 3.00 3.50 4 4 si
#> 2: a2 g3 5 5.00 4.33 3 1 q2
#> 3: a3 n2 11 NA NA NA 3 oq
#> 4: a4 m3 3 8.25 9.00 8 8 mx
#> 5: a5 2w 9 1.00 -0.33 5 17 ix
#> 6: a6 ps2 2 8.00 7.00 4 4 p2
#> 7: a7 kg2 6 NA NA NA 6 2q
Created on 2021-12-03 by the reprex package (v2.0.1)
Upvotes: 1