Reputation: 35
So I have a tibble (data frame) like this (the actual data frame is like 100+ rows)
sample_ID <- c(1, 2, 2, 3)
A <- c(NA, NA, 1, 3)
B <- c(1, 2, NA, 1)
C <- c(5, 1, NA, 2)
D <- c(NA, NA, 3, 1)
tibble(sample_ID,A,B,C,D)
# which reads
# A tibble: 4 × 5
sample_ID A B C D
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA 1 5 NA
2 2 NA 2 1 NA
3 2 1 NA NA 3
4 3 3 1 2 1
As can be seen here, the second and third rows have the same sample ID. I want to combine these two rows so that the tibble looks like
# A tibble: 3 × 5
sample_ID A B C D
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA 1 5 NA
2 2 1 2 1 3
3 3 3 1 2 1
In other words, I want the rows for sample_ID to be unique (order doesn't matter), and the values of other columns are merged (overwrite NA when possible). Can this be achieved in a simple way, such as using gather and spread? Many thanks.
Upvotes: 1
Views: 994
Reputation: 887971
We can use summarise_each
after grouping by 'sample_ID'
library(dplyr)
df %>%
group_by(sample_ID) %>%
summarise_each(funs(na.omit))
# A tibble: 3 × 5
# sample_ID A B C D
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 NA 1 5 NA
#2 2 1 2 1 3
#3 3 3 1 2 1
Upvotes: 1