Divide difference in consecutive rows by number of NA rows in between and reassign fraction to NA rows. R dplyr() mutate() lag()

Question

Using the dplyr() lag() function I'm finding the difference in 'x' between consecutive rows. For rows that have NA in 'x', the 'diff' is NA. For the next 'diff' the NA row seems to be excluded from the calculation:

df  %>%
  mutate(diff = x - lag(x))

date	x	diff
2021-01-01	0	0
2021-01-02	10	10
2021-01-02	30	20
2021-01-03	NA	NA
2021-01-04	60	30

To have a more complete data set I am assuming that between two entries there is a linear increase in 'x' when there are NA entries.

So, I would like to divide the 'diff' following an NA row by 1 + the number of NA rows that were 'skipped'. In this example, 1 NA row was skipped so I would want to divide 30/2 and enter 15 in the diff column for the NA row and the following row, as shown here.

date	x	diff
2021-01-01	0	0
2021-01-02	10	10
2021-01-02	30	20
2021-01-03	NA	15
2021-01-04	60	15

Martin Gal · Accepted Answer

You could use na.approx from package zoo:

library(dplyr)
library(zoo)

df %>% 
  mutate(diff = na.approx(x) - lag(na.approx(x)))

which gives you

# A tibble: 5 x 3
  date           x  diff
        
1 2021-01-01     0    NA
2 2021-01-02    10    10
3 2021-01-02    30    20
4 2021-01-03    NA    15
5 2021-01-04    60    15

With lag(x, default = 0) you can handle the NA at the beginning of your data.frame.

Divide difference in consecutive rows by number of NA rows in between and reassign fraction to NA rows. R dplyr() mutate() lag()

Answers (1)

Related Questions