vorpal
vorpal

Reputation: 320

Taking the average of lead and lag dates in dplyr mutate

I am trying to find the average of the date of the row above and of the row below. However, lag(Date) and lead(Date) are producing vectors which mean() then refuses to work with returning NA instead (figuring that out took me faaaaaar too long).

df <- data_frame(Date = as.Date(c("2020-01-01", NA, "2020-12-12")))
df <- mutate(df, Date = replace_na(Date, mean(c(lag(Date),lead(Date)))))

This gives ("2020-01-01", NA, "2020-12-12") whereas what I want is ("2020-01-01, "2020-06-06", "2020-12-12")

So how do I access the previous and next row value for Date so that I can generate an average?

Upvotes: 1

Views: 622

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269674

data_frame is deprecated so we have replaced that with data.frame. tibble would be an alternative but involves an additional dependency. Use na.approx in zoo and convert it back to Date class as it produces a numeric output. This also works to interpolate multiple NAs in a row. That would not work if we used lead and lag. If there are NAs at the ends this leaves them as is or we could use different arguments to na.approx to fill those in too; however, in the example data there are no such NAs so we leave it as shown.

(Note that the correct halfway point is as shown below rather than as shown in the question. There are 173 days between 2020-01-01 and 2020-06-22 and also between 2020-06-22 and 2020-12-12.)

library(dplyr)
library(zoo)

df <- data.frame(Date = as.Date(c("2020-01-01", NA, "2020-12-12"))) # test input

mutate(df, Date = as.Date(na.approx(Date, na.rm = FALSE)))

giving:

        Date
1 2020-01-01
2 2020-06-22
3 2020-12-12

Upvotes: 2

hello_friend
hello_friend

Reputation: 5788

Base R one liner solving your sample data (suspecting you have multiple instances of NA -- see solution below):

df$Date <- ifelse(is.na(df$Date), mean(df$Date, na.rm = TRUE), df$Date)

Interpolating dates:

df$Date <- as.Date(ifelse(
  is.na(df$Date),
  approx(as.numeric(df$Date), method = "linear", n = nrow(df))$y[which(is.na(df$Date))]
  ,
  df$Date
),
origin = as.Date("1970-01-01", "%Y-%m-%d"),
"%d-%m-%Y")

Data used:

df <- data.frame(Date = as.Date(c("2020-01-01", NA, "2020-12-12")))

Upvotes: 1

Related Questions