Reputation: 6578
I have a data frame:
sample event length
1 A1 DEL 30
2 A1 INV 10
3 A1 DEL 30
4 A2 DEL 10
5 A2 INV 20
6 A3 DEL 40
myData <- structure(list(sample = structure(c(1L, 1L, 1L, 2L, 2L, 3L), .Label = c("A1",
"A2", "A3"), class = "factor"), event = structure(c(1L, 2L, 1L,
1L, 2L, 1L), .Label = c("DEL", "INV"), class = "factor"), length = c(30,
10, 30, 10, 20, 40)), .Names = c("sample", "event", "length"), row.names = c(NA,
-6L), class = "data.frame")
And I am trying to plot the length of each event for each sample. Some samples have multiple events - some of which are the same - and in this case I would like to only plot the longest event per sample, rather than summing the values per sample as ggplot currently does:
p<-ggplot(myData)
p<-p + geom_bar(aes(sample,length),stat="identity")
p
For example, I would like to reduce my data frame to:
sample event length
1 A1 DEL 30
5 A2 INV 20
6 A3 DEL 40
Can anyone suggest how I could go about this?
Upvotes: 1
Views: 30
Reputation: 14360
You can do this with no pre-data manipulation by:
ggplot(myData) + stat_summary(aes(x=sample, y=length), geom = "bar", fun.y = max)
Alternatively, a data.table
way with data manipulation is:
library(data.table) -6L), class = "data.frame")
setDT(myData)[, .SD[which.max(length)], by = sample][,ggplot(.SD) + geom_bar(aes(x = sample, y = length), stat = "identity")]
Interestingly, you can call ggplot
within the data.table
syntax.
Upvotes: 2
Reputation: 887118
We can use which.max
after grouping by 'sample'
library(dplyr)
library(ggplot2)
myData %>%
group_by(sample) %>%
slice(which.max(length)) %>%
ggplot(.) +
geom_bar(aes(sample, length), stat = 'identity')
Upvotes: 2