Reputation: 816
I've created a few custom tweaks to zoo::na.locf before, but this one is driving me nuts. I need a function that will carry forward the last observation of a column only if the values in another column haven't changed; and it all has to be grouped by a primary key. For example:
library(dplyr)
set.seed(20180409)
data <- data.frame(Id = rep(1:10, each = 24),
Date = rep(seq.Date(as.Date("2016-01-01"), as.Date("2017-12-01"),
by = "month"), 10),
FillCol = replace(runif(240), runif(240) < 0.9, NA),
CheckCol = rep(letters[1:7], each = 7, length.out = 240))
data <- data %>%
group_by(Id) %>%
mutate(CheckColHasChanged = replace(lag(CheckCol) != CheckCol,
is.na(lag(CheckCol) != CheckCol), TRUE),
FillColIsNA = is.na(FillCol))
So I'm trying to carry foward any observations of FillCol, but once we hit an observation where CheckColHasChanged, stop the carry forward until the next valid observation in FillCol. I can do it in a loop but I'm struggling to do it properly.
Fill <- TRUE #indicator for whether or not I should be carrying forward
for(row in 2:nrow(data)){
#if the CheckCol has changed, don't fill
if(data$CheckColHasChanged[row]){Fill <- FALSE}
#if we should fill and still have the same Id, then fill from the last obs
if(Fill & data$Id[row] == data$Id[row - 1]){
data$FillCol[row] <- data$FillCol[row - 1]
}else{ #if there's a valid obs in FillCol, set the indicator back to true
if(!data$FillColIsNA[row]){Fill <- TRUE}
}
}
Any help would be greatly appreciated!
Upvotes: 0
Views: 327
Reputation: 145775
Comment to answer: this is just filling in by both Id and CheckCol:
data %>% group_by(Id, CheckCol) %>%
mutate(result = zoo::na.locf(FillCol, na.rm = FALSE))
The way you describe CheckCol
, it is treated just like an ID. There's no difference between "only if the values in another column haven't changed" and "grouped by a primary key". You just have two columns to group by.
Upvotes: 1