Reputation: 1460
I apologise that it is difficult for me to describe my problem clearly. I herein present one example to express what I want to do.
I have a dataframe:
a = data.frame(gene = c("A", "A", "A", "B", "B", "C"),
id = c(100, 100, 30, 250, 250, 600),
where = c("human", "flow", "apple", "human", "rock", "ghost"))
I want to remove the duplicated rows, while keep some information, and get an output like this:
gene id where
A 100, 30 human, flow, apple
B 250 human, rock
C 600 ghost
Thanks a lot for your help.
Upvotes: 1
Views: 149
Reputation: 39154
A solution using dplyr
.
library(dplyr)
a2 <- a %>%
group_by(gene) %>%
summarize_all(list(~toString(unique(.))))
a2
# # A tibble: 3 x 3
# gene id where
# <fct> <chr> <chr>
# 1 A 100, 30 human, flow, apple
# 2 B 250 human, rock
# 3 C 600 ghost
Or use data.table
.
library(data.table)
setDT(a)[, lapply(.SD, function(x) toString(unique(x))), by = gene][]
# gene id where
# 1: A 100, 30 human, flow, apple
# 2: B 250 human, rock
# 3: C 600 ghost
Or base R.
aggregate(x = a[, !names(a) %in% "gene"], by = a[, "gene", drop = FALSE],
function(x) toString(unique(x)))
# gene id where
# 1 A 100, 30 human, flow, apple
# 2 B 250 human, rock
# 3 C 600 ghost
Upvotes: 1