Reputation: 757
Suppose I have large number of “var_max” and var_min variables, they can have missing values in either of the pair or both. I want to fill NA values of one from another if the other variable have non-missing values.
dd <- data.frame(lac_max=c(1:4,NA,NA),lac_min=c(1:3,NA,3,2),
hr_max=c(32,4,3,NA,3,1),hr_min=c(NA,3,2,NA,4,1))
>dd
lac_max lac_min hr_max hr_min
1 1 1 32 NA
2 2 2 4 3
3 3 3 3 2
4 4 NA NA NA
5 NA 3 3 4
6 NA 2 1 1
dd_filled <- data.frame(lac_max=c(1:4,3,2),lac_min=c(1:3,4,3,2),
hr_max=c(32,4,3,NA,3,1),hr_min=c(32,3,2,NA,4,1))
>dd_filled
lac_max lac_min hr_max hr_min
1 1 1 32 32
2 2 2 4 3
3 3 3 3 2
4 4 4 NA NA
5 3 3 3 4
6 2 2 1 1
The dd_filled is what I want as a result.
I have tried to use mutate(across)
style to solve this problem; but I cannot figure out how to do that.
Upvotes: 0
Views: 181
Reputation: 8844
Something like this?
library(tidyr)
library(dplyr)
fill_pairs <-
. %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_to = c("var", "type"), names_sep = "_") %>%
group_by(id, var) %>%
fill(value, .direction = "downup") %>%
ungroup() %>%
pivot_wider(names_from = c("var", "type")) %>%
select(-id)
dd %>% mutate(fill_pairs(across(ends_with(c("_max", "_min")))))
dd
looks like this
> dd
lac_max lac_min hr_max hr_min random_col1 random_col2
1 1 1 32 NA 1 a
2 2 2 4 3 2 b
3 3 3 3 2 3 c
4 4 NA NA NA NA d
5 NA 3 3 4 NA e
6 NA 2 1 1 6 f
Output looks like this
lac_max lac_min hr_max hr_min random_col1 random_col2
1 1 1 32 32 1 a
2 2 2 4 3 2 b
3 3 3 3 2 3 c
4 4 4 NA NA NA d
5 3 3 3 4 NA e
6 2 2 1 1 6 f
Upvotes: 1
Reputation: 17299
You can use the coalesce
twice to achieve this goal:
library(dplyr)
dd %>% mutate(
lac_max = as.numeric(lac_max),
lac_max = coalesce(lac_max, lac_min),
lac_min = coalesce(lac_min, lac_max))
# lac_max lac_min hr_max hr_min
# 1 1 1 32 NA
# 2 2 2 4 3
# 3 3 3 3 2
# 4 4 4 NA NA
# 5 3 3 3 4
# 6 2 2 1 1
Upvotes: 0