Reputation: 899
I have some data frames look like:
data.frame(chr=c(3,3,3,1,1),start=c(15,52,17,1,80),end=c(52,68,18,15,92),strand=c("+","+","+","-","-"),item=c("A","A","B","C","C"))
chr start end strand item
1 3 15 52 + A
2 3 52 68 + A
3 3 17 18 + B
4 1 1 15 - C
5 1 80 92 - C
Item A and C could have two or more different starts and ends, but the rest columns are same inside each group. Is there a way to concatenate the start and stop information like this?
chr start end strand item
1 3 15,52 52,68 + A
2 3 17 18 + B
3 1 1,80 15,92 - C
Thanks for your help!
Upvotes: 1
Views: 42
Reputation: 886938
We can group by 'chr', 'strand', 'item', and paste
the 'start', 'end' values with toString
(=> paste(., collapse=", ")
)
library(dplyr)
df1 %>%
group_by(chr, strand, item) %>%
summarise(across(c(start, end), toString), .groups = 'drop') %>%
arrange(item)
-output
# A tibble: 3 x 5
# chr strand item start end
# <dbl> <chr> <chr> <chr> <chr>
#1 3 + A 15, 52 52, 68
#2 3 + B 17 18
#3 1 - C 1, 80 15, 92
Or using base R
with aggregate
aggregate(cbind(start, end) ~ ., df1, toString)
Upvotes: 1