fugu
fugu

Reputation: 6578

Take biggest value for each group in data frame

I have a data frame:

  sample event length
1     A1   DEL     30
2     A1   INV     10
3     A1   DEL     30
4     A2   DEL     10
5     A2   INV     20
6     A3   DEL     40

myData <- structure(list(sample = structure(c(1L, 1L, 1L, 2L, 2L, 3L), .Label = c("A1", 
"A2", "A3"), class = "factor"), event = structure(c(1L, 2L, 1L, 
1L, 2L, 1L), .Label = c("DEL", "INV"), class = "factor"), length = c(30, 
10, 30, 10, 20, 40)), .Names = c("sample", "event", "length"), row.names = c(NA, 
-6L), class = "data.frame")

And I am trying to plot the length of each event for each sample. Some samples have multiple events - some of which are the same - and in this case I would like to only plot the longest event per sample, rather than summing the values per sample as ggplot currently does:

p<-ggplot(myData)
p<-p + geom_bar(aes(sample,length),stat="identity")
p

enter image description here

For example, I would like to reduce my data frame to:

  sample event length
1     A1   DEL     30
5     A2   INV     20
6     A3   DEL     40

Can anyone suggest how I could go about this?

Upvotes: 1

Views: 30

Answers (2)

Mike H.
Mike H.

Reputation: 14360

You can do this with no pre-data manipulation by:

ggplot(myData) + stat_summary(aes(x=sample, y=length), geom = "bar", fun.y = max)

Alternatively, a data.table way with data manipulation is:

library(data.table)                                                                                                                                                                                                                                                                                   -6L), class = "data.frame")
setDT(myData)[, .SD[which.max(length)], by = sample][,ggplot(.SD) + geom_bar(aes(x = sample, y = length), stat = "identity")]

Interestingly, you can call ggplot within the data.table syntax.

enter image description here

Upvotes: 2

akrun
akrun

Reputation: 887118

We can use which.max after grouping by 'sample'

library(dplyr)
library(ggplot2)
myData %>%
    group_by(sample) %>%
    slice(which.max(length)) %>%
    ggplot(.) + 
    geom_bar(aes(sample, length), stat = 'identity')

enter image description here

Upvotes: 2

Related Questions