Reputation: 72
I have two datasets, with timesteps t and height h, which I merged.
dataset_a <- data.table(t=rep(c(1,2,3,4,5,6,7,8,9), each=5),
h=rep(c(1:5)),
v=c(1:(5*9)))
one has measurement gaps, and values where we actually measured, but measured nothing.
dataset_b <- data.table(t=rep(c(1,2,4,5,6,8,9), each=5),
h=rep(c(1:5)),
w=c(1:(5*7)))
dataset_b$w[12:20] <-0
merging:
dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))
Now I want to fill the gaps. How do I tell the data.table to use the neighboring values to fill the pixel?
dataset_merged[is.na(w),
w:= mean(c(the value at this h one timestep earlier, the value at this h one timestep later))]
Thanks a lot!
Edit After Bens very helpfull comment I had to adjust the reproducible example: His solution works, but not if 'framing' data is missing: if
dataset_b <- data.table(t=rep(c(2,4,5,6,8,9), each=5),
h=rep(c(1:5)),
w=c(1:(5*6)))
#removed the first timestep in this case
dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))
library(zoo)
dataset_merged[order(h,t)][, w := na.approx(w)]
yields
Error in `[.data.table`(dataset_merged[order(h, t)], , `:=`(w, na.approx(w))) :
Supplied 44 items to be assigned to 45 items of column 'w'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
It would be ok to keep those as NA, but how do I make this clear to the function? Unfortunately the original data is not on a regular grid.
Upvotes: 1
Views: 167
Reputation: 30494
Perhaps try this approach. Order the data table by h
before interpolation, and make w
numeric for decimal. Use approx
(base R) and group by = h
.
dataset_merged[order(h,t)][, w:= as.numeric(w)][, w := approx(.I, w, .I)$y, by = h]
Output
t h v w
1: 1 1 1 NA
2: 2 1 6 1.0
3: 3 1 11 3.5
4: 4 1 16 6.0
5: 5 1 21 11.0
6: 6 1 26 16.0
7: 7 1 31 18.5
8: 8 1 36 21.0
9: 9 1 41 26.0
10: 1 2 2 NA
11: 2 2 7 2.0
12: 3 2 12 4.5
13: 4 2 17 7.0
14: 5 2 22 12.0
15: 6 2 27 17.0
16: 7 2 32 19.5
17: 8 2 37 22.0
18: 9 2 42 27.0
19: 1 3 3 NA
20: 2 3 8 3.0
21: 3 3 13 5.5
22: 4 3 18 8.0
23: 5 3 23 13.0
24: 6 3 28 18.0
25: 7 3 33 20.5
26: 8 3 38 23.0
27: 9 3 43 28.0
28: 1 4 4 NA
29: 2 4 9 4.0
30: 3 4 14 6.5
31: 4 4 19 9.0
32: 5 4 24 14.0
33: 6 4 29 19.0
34: 7 4 34 21.5
35: 8 4 39 24.0
36: 9 4 44 29.0
37: 1 5 5 NA
38: 2 5 10 5.0
39: 3 5 15 7.5
40: 4 5 20 10.0
41: 5 5 25 15.0
42: 6 5 30 20.0
43: 7 5 35 22.5
44: 8 5 40 25.0
45: 9 5 45 30.0
t h v w
Additional (per OP): If there is a group with only NA
values for w
it has to be excluded.
Edit (5/28/20): To prevent using approx
when there are less than 2 values available for interpolation, you can also try:
dataset_merged[order(h,t)
][, w:= as.numeric(w)
][, w := if(length(na.omit(w)) < 2) w else approx(.I, w, .I)$y, by = h]
Test case:
dataset_b <- data.table(t=rep(c(2,4,5,6,8,9), each=5),
h=1:5,
w=1:30)
dataset_b$w[c(F,F,T,F,F)] <- NA
dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))
Output
t h v w
1: 1 1 1 NA
2: 2 1 6 1.0
3: 3 1 11 3.5
4: 4 1 16 6.0
5: 5 1 21 11.0
6: 6 1 26 16.0
7: 7 1 31 18.5
8: 8 1 36 21.0
9: 9 1 41 26.0
10: 1 2 2 NA
11: 2 2 7 2.0
12: 3 2 12 4.5
13: 4 2 17 7.0
14: 5 2 22 12.0
15: 6 2 27 17.0
16: 7 2 32 19.5
17: 8 2 37 22.0
18: 9 2 42 27.0
19: 1 3 3 NA
20: 2 3 8 NA
21: 3 3 13 NA
22: 4 3 18 NA
23: 5 3 23 NA
24: 6 3 28 NA
25: 7 3 33 NA
26: 8 3 38 NA
27: 9 3 43 NA
28: 1 4 4 NA
29: 2 4 9 4.0
30: 3 4 14 6.5
31: 4 4 19 9.0
32: 5 4 24 14.0
33: 6 4 29 19.0
34: 7 4 34 21.5
35: 8 4 39 24.0
36: 9 4 44 29.0
37: 1 5 5 NA
38: 2 5 10 5.0
39: 3 5 15 7.5
40: 4 5 20 10.0
41: 5 5 25 15.0
42: 6 5 30 20.0
43: 7 5 35 22.5
44: 8 5 40 25.0
45: 9 5 45 30.0
t h v w
Upvotes: 2