Z. Zhang
Z. Zhang

Reputation: 757

is there a tidy verse way to fill NA values from pairs of columns mutually?

Suppose I have large number of “var_max” and var_min variables, they can have missing values in either of the pair or both. I want to fill NA values of one from another if the other variable have non-missing values.

dd <- data.frame(lac_max=c(1:4,NA,NA),lac_min=c(1:3,NA,3,2),
                 hr_max=c(32,4,3,NA,3,1),hr_min=c(NA,3,2,NA,4,1))
>dd

      lac_max lac_min hr_max hr_min
    1       1       1     32     NA
    2       2       2      4      3
    3       3       3      3      2
    4       4      NA     NA     NA
    5      NA       3      3      4
    6      NA       2      1      1

dd_filled <- data.frame(lac_max=c(1:4,3,2),lac_min=c(1:3,4,3,2),
                        hr_max=c(32,4,3,NA,3,1),hr_min=c(32,3,2,NA,4,1))
>dd_filled

      lac_max lac_min hr_max hr_min
1       1       1     32     32
2       2       2      4      3
3       3       3      3      2
4       4       4     NA     NA
5       3       3      3      4
6       2       2      1      1

The dd_filled is what I want as a result. I have tried to use mutate(across) style to solve this problem; but I cannot figure out how to do that.

Upvotes: 0

Views: 181

Answers (2)

ekoam
ekoam

Reputation: 8844

Something like this?

library(tidyr)
library(dplyr)

fill_pairs <- 
  . %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id, names_to = c("var", "type"), names_sep = "_") %>% 
  group_by(id, var) %>% 
  fill(value, .direction = "downup") %>% 
  ungroup() %>% 
  pivot_wider(names_from = c("var", "type")) %>% 
  select(-id)

dd %>% mutate(fill_pairs(across(ends_with(c("_max", "_min")))))

dd looks like this

> dd
  lac_max lac_min hr_max hr_min random_col1 random_col2
1       1       1     32     NA           1           a
2       2       2      4      3           2           b
3       3       3      3      2           3           c
4       4      NA     NA     NA          NA           d
5      NA       3      3      4          NA           e
6      NA       2      1      1           6           f

Output looks like this

  lac_max lac_min hr_max hr_min random_col1 random_col2
1       1       1     32     32           1           a
2       2       2      4      3           2           b
3       3       3      3      2           3           c
4       4       4     NA     NA          NA           d
5       3       3      3      4          NA           e
6       2       2      1      1           6           f

Upvotes: 1

mt1022
mt1022

Reputation: 17299

You can use the coalesce twice to achieve this goal:

library(dplyr)
dd %>% mutate(
    lac_max = as.numeric(lac_max),
    lac_max = coalesce(lac_max, lac_min),
    lac_min = coalesce(lac_min, lac_max))
#   lac_max lac_min hr_max hr_min
# 1       1       1     32     NA
# 2       2       2      4      3
# 3       3       3      3      2
# 4       4       4     NA     NA
# 5       3       3      3      4
# 6       2       2      1      1

Upvotes: 0

Related Questions