Reputation: 31
I want to fill in the missing values with interpolation, but I do not know how to do it. My data frame looks like this:
fecha hora_prog prev_solfot_h3 prev_eol_h3
1 2019-01-01 1 0 3156
2 2019-01-01 2 0 3134
3 2019-01-01 3 0 3150
4 2019-01-01 4 1 3259
5 2019-01-01 5 2 3265
6 2019-01-01 6 2 3293
7 2019-01-01 7 3 3326
8 2019-01-01 8 35 3241
9 2019-01-01 9 68.4 3183
10 2019-01-01 10 759. 3090
11 2019-01-01 11 NA NA
12 2019-01-01 12 NA NA
13 2019-01-01 13 NA NA
14 2019-01-01 14 NA NA
15 2019-01-01 15 45 3326
16 2019-01-01 16 34 3156
17 2019-01-01 17 56 3134
18 2019-01-01 18 33 3150
19 2019-01-01 19 10 3259
20 2019-01-01 20 2 3265
21 2019-01-01 21 0 3156
22 2019-01-01 22 0 3134
23 2019-01-01 23 0 3150
24 2019-01-01 24 0 3259
25 2019-01-02 1 0 3265
.
.
.
with 19,693 more rows
There are more rows where I have NA's in prev_solfot_h3
and in prev_eol_h3
, both at the same time or only one (the other has value in that row), so what I want is to obtain the values missing by linear interpolation, but I do not know how to do it. Or maybe there is another method to obtain those values without using linear interpolation, I really don't know, I am a bit lost here and I need some help because I am new to Rstudio. Thank you!
Upvotes: 2
Views: 1772
Reputation: 270268
Use na.approx in zoo:
library(zoo)
DF[2:4] <- na.approx(DF[2:4])
Upvotes: 3
Reputation: 102710
You can try approx
like below
transform(
df,
prev_solfot_h3 = approx(seq_along(hora_prog)[!is.na(prev_solfot_h3)], prev_solfot_h3[!is.na(prev_solfot_h3)], seq_along(hora_prog))$y,
prev_eol_h3 = approx(seq_along(hora_prog)[!is.na(prev_eol_h3)], prev_eol_h3[!is.na(prev_eol_h3)], seq_along(hora_prog))$y
)
which gives
fecha hora_prog prev_solfot_h3 prev_eol_h3
1 2019-01-01 1 0.0 3156.0
2 2019-01-01 2 0.0 3134.0
3 2019-01-01 3 0.0 3150.0
4 2019-01-01 4 1.0 3259.0
5 2019-01-01 5 2.0 3265.0
6 2019-01-01 6 2.0 3293.0
7 2019-01-01 7 3.0 3326.0
8 2019-01-01 8 35.0 3241.0
9 2019-01-01 9 68.4 3183.0
10 2019-01-01 10 759.0 3090.0
11 2019-01-01 11 616.2 3137.2
12 2019-01-01 12 473.4 3184.4
13 2019-01-01 13 330.6 3231.6
14 2019-01-01 14 187.8 3278.8
15 2019-01-01 15 45.0 3326.0
16 2019-01-01 16 34.0 3156.0
17 2019-01-01 17 56.0 3134.0
18 2019-01-01 18 33.0 3150.0
19 2019-01-01 19 10.0 3259.0
20 2019-01-01 20 2.0 3265.0
21 2019-01-01 21 0.0 3156.0
22 2019-01-01 22 0.0 3134.0
23 2019-01-01 23 0.0 3150.0
24 2019-01-01 24 0.0 3259.0
25 2019-01-02 1 0.0 3265.0
Data
> dput(df)
structure(list(fecha = c("2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-02"), hora_prog = c(1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L, 24L, 1L), prev_solfot_h3 = c(0, 0, 0,
1, 2, 2, 3, 35, 68.4, 759, NA, NA, NA, NA, 45, 34, 56, 33, 10,
2, 0, 0, 0, 0, 0), prev_eol_h3 = c(3156L, 3134L, 3150L, 3259L,
3265L, 3293L, 3326L, 3241L, 3183L, 3090L, NA, NA, NA, NA, 3326L,
3156L, 3134L, 3150L, 3259L, 3265L, 3156L, 3134L, 3150L, 3259L,
3265L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25"))
Upvotes: 0