Jordo82
Jordo82

Reputation: 816

na.locf only if another column hasn't changed

I've created a few custom tweaks to zoo::na.locf before, but this one is driving me nuts. I need a function that will carry forward the last observation of a column only if the values in another column haven't changed; and it all has to be grouped by a primary key. For example:

library(dplyr)
set.seed(20180409)

data <- data.frame(Id = rep(1:10, each = 24),
               Date = rep(seq.Date(as.Date("2016-01-01"), as.Date("2017-12-01"), 
                                   by = "month"), 10),
               FillCol = replace(runif(240), runif(240) < 0.9, NA),
               CheckCol = rep(letters[1:7], each = 7, length.out = 240))

data <- data %>% 
  group_by(Id) %>% 
  mutate(CheckColHasChanged = replace(lag(CheckCol) != CheckCol, 
                                      is.na(lag(CheckCol) != CheckCol), TRUE),
         FillColIsNA = is.na(FillCol))

So I'm trying to carry foward any observations of FillCol, but once we hit an observation where CheckColHasChanged, stop the carry forward until the next valid observation in FillCol. I can do it in a loop but I'm struggling to do it properly.

Fill <- TRUE #indicator for whether or not I should be carrying forward
for(row in 2:nrow(data)){

  #if the CheckCol has changed, don't fill
  if(data$CheckColHasChanged[row]){Fill <- FALSE}

  #if we should fill and still have the same Id, then fill from the last obs
  if(Fill & data$Id[row] == data$Id[row - 1]){
    data$FillCol[row] <- data$FillCol[row - 1]
  }else{ #if there's a valid obs in FillCol, set the indicator back to true
    if(!data$FillColIsNA[row]){Fill <- TRUE}
  }
}

Any help would be greatly appreciated!

Upvotes: 0

Views: 327

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145775

Comment to answer: this is just filling in by both Id and CheckCol:

data %>% group_by(Id, CheckCol) %>% 
    mutate(result = zoo::na.locf(FillCol, na.rm = FALSE))

The way you describe CheckCol, it is treated just like an ID. There's no difference between "only if the values in another column haven't changed" and "grouped by a primary key". You just have two columns to group by.

Upvotes: 1

Related Questions