bio8
bio8

Reputation: 174

R: How to combine duplicated rows from multiple columns based on unique values in a single column and merge those unique values by |?

I have the following data frame:

gene    gene_name   source  chromosome  details
1       a           A           2       01; xyz
1       a           A           2       02; ijk
2       b           B           3       03; efg
2       b           C           3       03; efg
3       c           D           4       04; lmn
3       c           D           4       05; opq
3       c           D           4       06; rst
4       NA          10          6       NA
4       NA          11          6       NA

I want to get the following output:

gene    gene_name   source  chromosome  details
1       a           A       2           01; xyz | 02;ijk
2       b           B, C    3           03; efg
3       c           D       4           04; lmn | 05; opq | 06; rst
4       NA          10, 11  6           NA | NA

I have tried to use aggregate() and group_by() in different ways, but did not get it.

Please guide.

Thanks.

Upvotes: 0

Views: 1068

Answers (1)

Martin
Martin

Reputation: 66

This should work:

df %>%
  group_by(gene, gene_name, source, chromosome) %>%
  summarise(details = paste(details, collapse = " | "))

I ran the below on iris and got a result similar to as you described

iris %>%
  group_by(Sepal.Length, Sepal.Width, Petal.Length, Species) %>%
  summarise(Petal.Width = paste(Petal.Width, collapse = " | "))

Upvotes: 1

Related Questions