Alisher Narzulloyev
Alisher Narzulloyev

Reputation: 85

Forward and backward difference between rows with missing values

Here is the sample dataframe:

df <- data.frame(
 id = c("A", "A", "A", "A", "B", "B", "B", "B"), 
 num = c(1, NA, 6, 3, 7, NA , NA, 2))

How do I get forward and backward difference between rows over id category? There should be two new columns: one difference between between current raw and previous, and the other should be difference between current raw and next raw. If the previous raw is NA then it should calculate the difference between current row and the first previous raw that contains real number. The same holds for the other forward difference case.

Many thanks!!

Upvotes: 1

Views: 367

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28695

require(magrittr)

df$backdiff <- c(NA, sapply(2:nrow(df), 
                            function(i){
                                df$num[i] - df$num[(i-1):1] %>% .[!is.na(.)][1]
                            }))

df$forward.diff <- c(sapply(2:nrow(df) - 1, 
                            function(i){
                                df$num[i] - df$num[(i+1):nrow(df)] %>% .[!is.na(.)][1]
                            }), NA)

Upvotes: 1

MKR
MKR

Reputation: 20095

One solution could be achieved by using fill function from tidyr to create two columns (one each for prev and next calculation) where NA values are removed.

df <- data.frame(
  id = c("A", "A", "A", "A", "B", "B", "B", "B"), 
  num = c(1, NA, 6, 3, 7, NA , NA, 2))

  library("tidyverse")

  df %>% mutate(dup_num_prv = num, dup_num_nxt = num) %>%
  group_by(id) %>%
  fill(dup_num_prv, .direction = "down")  %>%
  fill(dup_num_nxt, .direction = "up")  %>%
  mutate(prev_diff = ifelse(is.na(num), NA, num - lag(dup_num_prv)))  %>%
  mutate(next_diff = ifelse(is.na(num), NA, num - lead(dup_num_nxt))) %>%
  as.data.frame()

 # Result is shown in columns 'prev_diff' and 'next_diff'

#  id num dup_num_prv dup_num_nxt prev_diff next_diff
#1  A   1           1           1        NA        -5
#2  A  NA           1           6        NA        NA
#3  A   6           6           6         5         3
#4  A   3           3           3        -3        NA
#5  B   7           7           7        NA         5
#6  B  NA           7           2        NA        NA
#7  B  NA           7           2        NA        NA
#8  B   2           2           2        -5        NA

Note: There are few queries which OP needs to clarify. The solution can be fine-tuned afterwards. dup_num_prv and dup_num_nxtare kept just for understanding purpose. These column can be removed.

Upvotes: 0

Related Questions