Reputation: 618
I have a soil properties data.table with values for different locations and depths. Some values are NA so I'd like to get the mean values considering the upper and lower layers. In the case of the top layer, I'd take the value from the next one down.
I was able to create a column indicating which are the upper and lower layer for each row and I though about doing a self merge. But I'm completely lost at how to proceed.
Any clues as how to do this? Bellow is an example data.table and what I'd like to achieve. The example considers two locations with 3 layers. But I have multiple locations and some have more layers than others.
library(data.table)
# I was able to identify which are the botton and top layers
# using a function to identify the neighboors
dt <- data.table(id = rep(c(1,2), 1, each = 3),
depth = c(10, 20, 30, 10, 20, 30),
val = c(12, 18, 11, 25, 27, 29),
bot_l = c(20, 30, NA, 20, 30, NA),
top_l = c(NA, 10, 20, NA, 10, 20))
# How can I calculate the average between top and lowe layers?
dt_desired <- data.table(id = rep(c(1,2), 1, each = 3),
depth = c(10, 20, 30, 10, 20, 30),
val = c(12, 18, 11, 25, 27, 29),
bot_l = c(20, 30, NA, 20, 30, NA),
top_l = c(NA, 10, 20, NA, 10, 20)
mean_top_bot = c(18, 11.5, 18, 27, 27, 27))
To explain a bit more:
mean_top_bot
values by hand. That's why I had some errors there :facepal:Solution using self merge
I was able to merge the table with itself, by changin the by.x
and by.y`parameters. But I have a feeling that I'm doing this in the worst way possible.
dt1 <- merge(dt, dt[, .SD, .SDcols = !c('bot_l', 'top_l')],
by.x = c('id', 'bot_l'),
by.y = c('id', 'depth'),
all = TRUE)[order(id, depth)]
id bot_l depth val.x top_l val.y
1: 1 20 10 12 NA 18
2: 1 30 20 18 10 11
3: 1 NA 30 11 20 NA
4: 1 10 NA NA NA 12
5: 2 20 10 25 NA 27
6: 2 30 20 27 10 29
7: 2 NA 30 29 20 NA
8: 2 10 NA NA NA 25
Are there any easier ways to do this?
Upvotes: 1
Views: 364
Reputation: 136
It should be easier to use directly data.table::shift
without computing "top" and "bot" layers.
dt <- data.table(id = rep(c(1,2), 1, each = 3),
depth = c(10, 20, 30, 10, 20, 30),
val = c(12, 18, 11, 25, 27, 29))
dt[, v := rowMeans(data.table::setDT(data.table::shift(val,
c(1, - 1))),
na.rm = TRUE),
by = id]
Same but with maggrittr :
library(magrittr)
dt[, v := data.table::shift(val, c(1, -1)) %>% data.table::setDT() %>% rowMeans(na.rm = TRUE),
by = id]
The code above do the mean between previous and next val for a given depth. I suppose the gap between the value and top/bot layers is one and the data are already ordered by id and depth, as in your example.
Upvotes: 3
Reputation: 42544
It took me a while to figure out, but this can be solved as well by a rolling mean:
dt[, mean_top_bot :=
zoo::rollapply(val, width = list(c(-1L, 1L)), FUN = mean, partial = TRUE), id][]
id depth val bot_l top_l mean_top_bot 1: 1 10 12 20 NA 18 2: 1 20 18 30 10 11.5 3: 1 30 11 NA 20 18 4: 2 10 25 20 NA 27 5: 2 20 27 30 10 27 6: 2 30 29 NA 20 27
Two characteristics of zoo::rollapply()
come in handy:
width
argument alternatively takes a list of integer offsets. So, list(c(-1L, 1L))
refers to the values of the preceeding and subsequent rows while omitting the current row.partial = TRUE
, only the subset of indexes that are in range are passed to FUN
. E.g., for the first row, offset -1 refers to index 0 which is out of range. Therefore, only the value of index 2 (offset 1) is passed to mean()
. Likewise for the last row, where only the second to last value is passed to mean()
.Upvotes: 2