Apply function based on value in other column

Question

I have a dataframe that contains 10 observation per unique ID and some other variables. Unfortunately, the data has some missing values (NA's) for a specific column that I want to replace by carrying the last observation forward (by means of the na.locf function of the zoo package).

The thing is, I need to perform this function per unique user, so I need to "split" the dataframe per unique ID, and then apply the function, while keeping all other columns. I have tried the following, but this did not work:

df %>%
  group_by(ID) %>%
  group_split() %>%
  lapply(Location, na.locf() %>%
  bind_rows

A reproducible example of the dataset:

ID <- rep(1:5, each = 10)
Age <- sample(18:65, 50, replace = TRUE)
Locations <- c("Europe", "Asia", NA)
Location <- sample(Location, 50, replace = TRUE)
df <- data.frame(ID, Age, Location)

Roland · Accepted Answer

Using group_by then na.locf, with the na.rm=FALSE option to keep the length of the output:

library(dplyr)
df <- df %>% group_by(ID) %>% 
mutate(Location=zoo::na.locf(Location, na.rm=FALSE))

As Martin commented, if the series starts with a missing value for a given ID, it will not be filled. This script does not fail, even when the series starts with a missing. For instance:

df <- tibble(ID=c(1, 1, 2, 2), Location=c(NA,"Europe","Asia",NA))
df %>% group_by(ID) %>% mutate(Location2=zoo::na.locf(Location, na.rm=FALSE))

returns:

# A tibble: 4 x 3
# Groups:   ID [2]
     ID Location Location2
           
1     1           
2     1 Europe   Europe   
3     2 Asia     Asia     
4     2      Asia

Apply function based on value in other column

Answers (1)

Related Questions