Alex
Alex

Reputation: 31

Linear interpolate for missing values in R

I want to fill in the missing values with interpolation, but I do not know how to do it. My data frame looks like this:

 fecha                hora_prog    prev_solfot_h3   prev_eol_h3
 1 2019-01-01         1            0                3156
 2 2019-01-01         2            0                3134
 3 2019-01-01         3            0                3150
 4 2019-01-01         4            1                3259
 5 2019-01-01         5            2                3265
 6 2019-01-01         6            2                3293
 7 2019-01-01         7            3                3326
 8 2019-01-01         8           35                3241
 9 2019-01-01         9           68.4              3183
10 2019-01-01        10           759.              3090
11 2019-01-01         11           NA               NA
12 2019-01-01         12           NA               NA
13 2019-01-01         13           NA               NA
14 2019-01-01         14           NA               NA
15 2019-01-01         15           45               3326
16 2019-01-01         16           34               3156
17 2019-01-01         17           56               3134
18 2019-01-01         18           33               3150
19 2019-01-01         19           10               3259
20 2019-01-01         20           2                3265
21 2019-01-01         21           0                3156
22 2019-01-01         22           0                3134
23 2019-01-01         23           0                3150
24 2019-01-01         24           0                3259
25 2019-01-02         1            0                3265
.
.
.
with 19,693 more rows

There are more rows where I have NA's in prev_solfot_h3 and in prev_eol_h3, both at the same time or only one (the other has value in that row), so what I want is to obtain the values missing by linear interpolation, but I do not know how to do it. Or maybe there is another method to obtain those values without using linear interpolation, I really don't know, I am a bit lost here and I need some help because I am new to Rstudio. Thank you!

Upvotes: 2

Views: 1772

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270268

Use na.approx in zoo:

library(zoo)
DF[2:4] <- na.approx(DF[2:4])

Upvotes: 3

ThomasIsCoding
ThomasIsCoding

Reputation: 102710

You can try approx like below

transform(
  df,
  prev_solfot_h3 = approx(seq_along(hora_prog)[!is.na(prev_solfot_h3)], prev_solfot_h3[!is.na(prev_solfot_h3)], seq_along(hora_prog))$y,
  prev_eol_h3 = approx(seq_along(hora_prog)[!is.na(prev_eol_h3)], prev_eol_h3[!is.na(prev_eol_h3)], seq_along(hora_prog))$y
)

which gives

        fecha hora_prog prev_solfot_h3 prev_eol_h3
1  2019-01-01         1            0.0      3156.0
2  2019-01-01         2            0.0      3134.0
3  2019-01-01         3            0.0      3150.0
4  2019-01-01         4            1.0      3259.0
5  2019-01-01         5            2.0      3265.0
6  2019-01-01         6            2.0      3293.0
7  2019-01-01         7            3.0      3326.0
8  2019-01-01         8           35.0      3241.0
9  2019-01-01         9           68.4      3183.0
10 2019-01-01        10          759.0      3090.0
11 2019-01-01        11          616.2      3137.2
12 2019-01-01        12          473.4      3184.4
13 2019-01-01        13          330.6      3231.6
14 2019-01-01        14          187.8      3278.8
15 2019-01-01        15           45.0      3326.0
16 2019-01-01        16           34.0      3156.0
17 2019-01-01        17           56.0      3134.0
18 2019-01-01        18           33.0      3150.0
19 2019-01-01        19           10.0      3259.0
20 2019-01-01        20            2.0      3265.0
21 2019-01-01        21            0.0      3156.0
22 2019-01-01        22            0.0      3134.0
23 2019-01-01        23            0.0      3150.0
24 2019-01-01        24            0.0      3259.0
25 2019-01-02         1            0.0      3265.0

Data

> dput(df)
structure(list(fecha = c("2019-01-01", "2019-01-01", "2019-01-01", 
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-02"), hora_prog = c(1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L, 24L, 1L), prev_solfot_h3 = c(0, 0, 0,
1, 2, 2, 3, 35, 68.4, 759, NA, NA, NA, NA, 45, 34, 56, 33, 10,
2, 0, 0, 0, 0, 0), prev_eol_h3 = c(3156L, 3134L, 3150L, 3259L,
3265L, 3293L, 3326L, 3241L, 3183L, 3090L, NA, NA, NA, NA, 3326L,
3156L, 3134L, 3150L, 3259L, 3265L, 3156L, 3134L, 3150L, 3259L,
3265L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25"))

Upvotes: 0

Related Questions