Reputation: 85
Here is the sample dataframe:
df <- data.frame(
id = c("A", "A", "A", "A", "B", "B", "B", "B"),
num = c(1, NA, 6, 3, 7, NA , NA, 2))
How do I get forward and backward difference between rows over id category? There should be two new columns: one difference between between current raw and previous, and the other should be difference between current raw and next raw. If the previous raw is NA then it should calculate the difference between current row and the first previous raw that contains real number. The same holds for the other forward difference case.
Many thanks!!
Upvotes: 1
Views: 367
Reputation: 28695
require(magrittr)
df$backdiff <- c(NA, sapply(2:nrow(df),
function(i){
df$num[i] - df$num[(i-1):1] %>% .[!is.na(.)][1]
}))
df$forward.diff <- c(sapply(2:nrow(df) - 1,
function(i){
df$num[i] - df$num[(i+1):nrow(df)] %>% .[!is.na(.)][1]
}), NA)
Upvotes: 1
Reputation: 20095
One solution could be achieved by using fill
function from tidyr
to create two columns (one each for prev and next calculation) where NA
values are removed.
df <- data.frame(
id = c("A", "A", "A", "A", "B", "B", "B", "B"),
num = c(1, NA, 6, 3, 7, NA , NA, 2))
library("tidyverse")
df %>% mutate(dup_num_prv = num, dup_num_nxt = num) %>%
group_by(id) %>%
fill(dup_num_prv, .direction = "down") %>%
fill(dup_num_nxt, .direction = "up") %>%
mutate(prev_diff = ifelse(is.na(num), NA, num - lag(dup_num_prv))) %>%
mutate(next_diff = ifelse(is.na(num), NA, num - lead(dup_num_nxt))) %>%
as.data.frame()
# Result is shown in columns 'prev_diff' and 'next_diff'
# id num dup_num_prv dup_num_nxt prev_diff next_diff
#1 A 1 1 1 NA -5
#2 A NA 1 6 NA NA
#3 A 6 6 6 5 3
#4 A 3 3 3 -3 NA
#5 B 7 7 7 NA 5
#6 B NA 7 2 NA NA
#7 B NA 7 2 NA NA
#8 B 2 2 2 -5 NA
Note: There are few queries which OP needs to clarify. The solution can be fine-tuned afterwards. dup_num_prv
and dup_num_nxt
are kept just for understanding purpose. These column can be removed.
Upvotes: 0