Manasi Shah
Manasi Shah

Reputation: 437

plot geom_point on top of geom_histogram

I have a dataframe:

test <- structure(list(Sample_ID = c("S1","S2", "S3", "S4", "S1", "S2", "S3", "S4"), 
                       CN_Region = c("A", "A", "A", "A", "B", "B", "B", "B"),
                       MedianLog2Ratio = c(-0.2, -0.2, -0.25, -0.25, -0.25, -0.2, -0.1, -0.3), 
             CN_truth = c("2", "2", "2", NA, "2", "2", "2", "1")), class = c("data.table","data.frame"))

When I plot hist it works fine: hist(test$MedianLog2Ratio)

I would like to plot a per region histogram using ggplot and overlay with geom_points CN_truth associated with the SampleID:

g <- ggplot(test, aes(x = MedianLog2Ratio)) + geom_histogram()
g + geom_point(aes(colour = factor(CN_truth))

Plot should look like this loosely (of course it will have fewer bins with lesser data): sample_plot where legend refers to CN_truth and title is CN_Region

Upvotes: 0

Views: 1278

Answers (1)

Majid
Majid

Reputation: 1854

One way to do this is as below:

test <- data.frame(Sample_ID = c("S1","S2", "S3", "S4", "S1", "S2", "S3", "S4"), 
                       CN_Region = c("A", "A", "A", "A", "B", "B", "B", "B"),
                       MedianLog2Ratio = c(-0.2, -0.2, -0.25, -0.25, -0.25, -0.2, -0.1, -0.3), 
                       CN_truth = c("2", "2", "2", NA, "2", "2", "2", "1"))
test <- transform(test, freqmlr = ave(seq(nrow(test)), MedianLog2Ratio, FUN=length))



g <- ggplot(test, aes(x = MedianLog2Ratio)) + geom_histogram(color="black", fill="white")+
     geom_point(aes(x=MedianLog2Ratio, y=freqmlr, colour=factor(CN_truth)))+
        xlab('MedianLog2Ratio') +
        ylab('Freq')+
        labs(colour='CN_truth')
g 

enter image description here

There are many posts for you if you wanted to remove NA from the legend (such as this one). Please also note that if there are many points with the same value for x-axis you can move them a bit within each histogram to make them visible. For instance by adding random decimal value:

g <- ggplot(test, aes(x = MedianLog2Ratio)) + geom_histogram(color="black", fill="white")+
  geom_point(aes(x=(MedianLog2Ratio+runif(nrow(test), 0.0, 0.010)), y=freqmlr, 
                 colour=CN_truth ))+
  xlab('MedianLog2Ratio') +
  ylab('Freq')+
  labs(colour='CN_truth')
g  + scale_colour_manual(values = c("red", "blue"), limits = c("1", "2")) 

enter image description here

Upvotes: 1

Related Questions