How to combine the values of various columns in a tibble by the same row ID

Question

So I have a tibble (data frame) like this (the actual data frame is like 100+ rows)

sample_ID <- c(1, 2, 2, 3)
A <- c(NA, NA, 1, 3)
B <- c(1, 2, NA, 1)
C <- c(5, 1, NA, 2)
D <- c(NA, NA, 3, 1)
tibble(sample_ID,A,B,C,D)

# which reads
# A tibble: 4 × 5
  sample_ID     A     B     C     D
           
1         1    NA     1     5    NA
2         2    NA     2     1    NA
3         2     1    NA    NA     3
4         3     3     1     2     1

As can be seen here, the second and third rows have the same sample ID. I want to combine these two rows so that the tibble looks like

# A tibble: 3 × 5
  sample_ID     A     B     C     D
           
1         1    NA     1     5    NA
2         2     1     2     1     3
3         3     3     1     2     1

In other words, I want the rows for sample_ID to be unique (order doesn't matter), and the values of other columns are merged (overwrite NA when possible). Can this be achieved in a simple way, such as using gather and spread? Many thanks.

akrun · Accepted Answer

We can use summarise_each after grouping by 'sample_ID'

library(dplyr)
df %>% 
   group_by(sample_ID) %>% 
   summarise_each(funs(na.omit))
# A tibble: 3 × 5
#   sample_ID     A     B     C     D
#          
#1         1    NA     1     5    NA
#2         2     1     2     1     3
#3         3     3     1     2     1

How to combine the values of various columns in a tibble by the same row ID

Answers (1)

Related Questions