Reputation: 21
I have dataframe of samples and genes associates with each gene. I need to extract all genes associated with each sample into a dataframe.
Sample | Genes |
---|---|
A | Gene1 |
A | Gene2 |
B | Gene3 |
B | Gene2 |
C | Gene3 |
B | Gene1 |
C | Gene4 |
I need to create a dataframe like below.
Sample | Genes |
---|---|
A | Gene1, Gene2 |
B | Gene3, Gene2, Gene1 |
C | Gene3, Gene4 |
What's the best possible way to do this?
Upvotes: 0
Views: 76
Reputation: 2412
like this?
library(dplyr)
data <- data.frame(
Sample = c('A', 'A', 'B', 'B', 'C', 'B', 'C'),
Genes = c('Gene1', 'Gene2', 'Gene3', 'Gene2', 'Gene3', 'Gene1', 'Gene4')
)
result <- data %>%
group_by(Sample) %>%
summarize(Genes = list(unique(Genes)))
Upvotes: 2