Reputation: 5719
I have this data frame called mydf
. There are duplicated samples in the Sample
column. I want to extract the unique sample rows with the maximum total_reads
and get the result
.
mydf<-structure(list(Sample = c("AOGC-02-0188", "AOGC-02-0191", "AOGC-02-0191",
"AOGC-02-0191", "AOGC-02-0194", "AOGC-02-0194", "AOGC-02-0194"
), total_reads = c(27392583, 19206920, 34462563, 53669483, 24731988,
43419826, 68151814), Lane = c("4", "5", "4", "4;5", "5", "4",
"4;5")), .Names = c("Sample", "total_reads", "Lane"), row.names = c("166",
"169", "170", "171", "173", "174", "175"), class = "data.frame")
result
Sample total_reads Lane
AOGC-02-0188 27392583 4
AOGC-02-0191 53669483 4;5
AOGC-02-0194 68151814 4;5
Upvotes: 2
Views: 1449
Reputation: 886938
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(mydf)
), grouped by "Sample", order
the 'total_reads' desendingly and subset the first observation with head
.
library(data.table)
setDT(mydf)[order(-total_reads), head(.SD, 1) , Sample]
Upvotes: 2
Reputation: 51582
You can aggregate
and then merge
,
merge(aggregate(total_reads ~ Sample, mydf, max), mydf)
# Sample total_reads Lane
#1 AOGC-02-0188 27392583 4
#2 AOGC-02-0191 53669483 4;5
#3 AOGC-02-0194 68151814 4;5
Upvotes: 4
Reputation: 555
Using the dplyr
package, you could do that like this:
mydf %>%
group_by(Sample) %>% # for each unique sample
arrange(-total_reads) %>% # order by total_reads DESC
slice(1) # select the first row, i.e. with highest total_reads
Upvotes: 2