luaprebel
luaprebel

Reputation: 43

dplyr::coalesce, but for multiple rows

I have some code that outputs a list of data frames, which contain some metadata and different sets of measurements. I need to be able to process the list of data frames and return a single data frame, by 'squashing' the NA values into single rows, where the metadata is the same. I'd like to use dplyr functions if possible.

I've tried multiple types of joins, bind_rows, bind_cols (then remove duplicates), the coalesce function (which looked promising, but only works for one row, not multiple.

The data input is extremely variable, with differing numbers of measurements.

An example of what I am starting with:

dfs <- list(
  df1 = tibble(
    exp = "EXP1",
    stage = "Stage1",
    section = rep(101:105, each = 2),
    meas1 = c(29L, 30L, 20L, 23L, 25L, 28L, 21L, 26L, 24L, 22L)
  ),
  df2 = tibble(
    exp = "EXP1",
    stage = "Stage1",
    section = 101:105,
    meas2 = 30:34
  ),
  df3 = tibble(
    exp = "EXP1",
    stage = "Stage2",
    section = 101:105,
    meas2 = 40:44
  )
)

$df1
# A tibble: 10 x 4
   exp   stage  section meas1
   <chr> <chr>    <int> <int>
 1 EXP1  Stage1     101    29
 2 EXP1  Stage1     101    30
 3 EXP1  Stage1     102    20
 4 EXP1  Stage1     102    23
 5 EXP1  Stage1     103    25
 6 EXP1  Stage1     103    28
 7 EXP1  Stage1     104    21
 8 EXP1  Stage1     104    26
 9 EXP1  Stage1     105    24
10 EXP1  Stage1     105    22

$df2
# A tibble: 5 x 4
  exp   stage  section meas2
  <chr> <chr>    <int> <int>
1 EXP1  Stage1     101    30
2 EXP1  Stage1     102    31
3 EXP1  Stage1     103    32
4 EXP1  Stage1     104    33
5 EXP1  Stage1     105    34

$df3
# A tibble: 5 x 4
  exp   stage  section meas2
  <chr> <chr>    <int> <int>
1 EXP1  Stage2     101    40
2 EXP1  Stage2     102    41
3 EXP1  Stage2     103    42
4 EXP1  Stage2     104    43
5 EXP1  Stage2     105    44

An example of what I'm hoping to end with:

    result <- tibble(
  exp = "EXP1",
  stage = c(rep('Stage1', 10), rep('Stage2', 5)),
  section = c(rep(101:105, each = 2), 101:105),
  meas1 = c(c(29L, 30L, 20L, 23L, 25L, 28L, 21L, 26L, 24L, 22L), rep(NA_real_, 5)),
  meas2 = c(30, NA_real_, 31, NA_real_, 32, NA_real_, 33, NA_real_, 34, NA_real_, 40:44)
)



 # A tibble: 15 x 5
   exp   stage  section meas1 meas2
   <chr> <chr>    <int> <dbl> <dbl>
 1 EXP1  Stage1     101    29    30
 2 EXP1  Stage1     101    30    NA
 3 EXP1  Stage1     102    20    31
 4 EXP1  Stage1     102    23    NA
 5 EXP1  Stage1     103    25    32
 6 EXP1  Stage1     103    28    NA
 7 EXP1  Stage1     104    21    33
 8 EXP1  Stage1     104    26    NA
 9 EXP1  Stage1     105    24    34
10 EXP1  Stage1     105    22    NA
11 EXP1  Stage2     101    NA    40
12 EXP1  Stage2     102    NA    41
13 EXP1  Stage2     103    NA    42
14 EXP1  Stage2     104    NA    43
15 EXP1  Stage2     105    NA    44

Thanks for any help you can give me. After several years of R programming, this is my first question asked on SO. I've always been able to figure it out using SO.

Upvotes: 4

Views: 541

Answers (1)

akrun
akrun

Reputation: 887831

We can use bind_rows to bind the list elements and then do a group by operation

library(dplyr)
bind_rows(dfs) %>% 
    group_by(exp, stage, section) %>% 
    summarise(across(everything(), ~ if(all(is.na(.))) NA_integer_ 
        else na.omit(.)))

Upvotes: 2

Related Questions