Reputation: 437
I have a dataframe:
test <- structure(list(Sample_ID = c("S1","S2", "S3", "S4", "S1", "S2", "S3", "S4"),
CN_Region = c("A", "A", "A", "A", "B", "B", "B", "B"),
MedianLog2Ratio = c(-0.2, -0.2, -0.25, -0.25, -0.25, -0.2, -0.1, -0.3),
CN_truth = c("2", "2", "2", NA, "2", "2", "2", "1")), class = c("data.table","data.frame"))
When I plot hist
it works fine:
hist(test$MedianLog2Ratio)
I would like to plot a per region histogram using ggplot and overlay with geom_points
CN_truth
associated with the SampleID
:
g <- ggplot(test, aes(x = MedianLog2Ratio)) + geom_histogram()
g + geom_point(aes(colour = factor(CN_truth))
Plot should look like this loosely (of course it will have fewer bins with lesser data):
where legend refers to
CN_truth
and title is CN_Region
Upvotes: 0
Views: 1278
Reputation: 1854
One way to do this is as below:
test <- data.frame(Sample_ID = c("S1","S2", "S3", "S4", "S1", "S2", "S3", "S4"),
CN_Region = c("A", "A", "A", "A", "B", "B", "B", "B"),
MedianLog2Ratio = c(-0.2, -0.2, -0.25, -0.25, -0.25, -0.2, -0.1, -0.3),
CN_truth = c("2", "2", "2", NA, "2", "2", "2", "1"))
test <- transform(test, freqmlr = ave(seq(nrow(test)), MedianLog2Ratio, FUN=length))
g <- ggplot(test, aes(x = MedianLog2Ratio)) + geom_histogram(color="black", fill="white")+
geom_point(aes(x=MedianLog2Ratio, y=freqmlr, colour=factor(CN_truth)))+
xlab('MedianLog2Ratio') +
ylab('Freq')+
labs(colour='CN_truth')
g
There are many posts for you if you wanted to remove NA
from the legend (such as this one). Please also note that if there are many points with the same value for x-axis you can move them a bit within each histogram to make them visible. For instance by adding random decimal value:
g <- ggplot(test, aes(x = MedianLog2Ratio)) + geom_histogram(color="black", fill="white")+
geom_point(aes(x=(MedianLog2Ratio+runif(nrow(test), 0.0, 0.010)), y=freqmlr,
colour=CN_truth ))+
xlab('MedianLog2Ratio') +
ylab('Freq')+
labs(colour='CN_truth')
g + scale_colour_manual(values = c("red", "blue"), limits = c("1", "2"))
Upvotes: 1