dejonggr
dejonggr

Reputation: 13

Multi-group histogram with group-specific frequencies

First off, I've already read the following thread: ggplot2 - Multi-group histogram with in-group proportions rather than frequency

I followed the ddply suggestion and it didn't seem to work for my data. Logically the code should work perfectly on my dataset and I have no idea what I'm doing wrong.

Overall: I'd like to make a histogram (I'm learning ggplot) that displays the genotype frequency in each of my study groups.

Something like this:

enter image description here

Here's a mock data set that mirrors my own:

df<-data.frame(ID=1:60,
               Genotypes=sample(c("CG", "CC", "GG"), 60, replace=T),
               Study_Group=sample(c("Control", "Pathology1", "pathology2"), 60, replace=T))

I've tried variants of p + geom_bar(aes(aes(y = ..count../sum(..count..)) but r returns "cannot find 'count' object" or something to that effect.

I also tried:

df.new<-ddply(df,.(Study_Group),summarise,
              prop=prop.table(table(df$Genotype)),
              Genotype=names(table(df$Genotype)))`

And I believe there was an error with the summarise function, but to be honest, I have no idea what I'm doing.

Is the problem simply my comprehension of the solution or is it something inherently different in my data set?

Thanks for the help.

Upvotes: 1

Views: 5667

Answers (1)

Nick Criswell
Nick Criswell

Reputation: 1743

Give this a try. In this, I am using dplyr which is a package that contains updated versions of the ddply-type functions from plyr. One thing, I am not sure if you want to have your x-axis be the Study_Groups or your Genotypes. your question states you want the frequency of Genotype within each group but your graph has the Genotypes on the x. The solution follows the stated desire, not the plot. However, making the change to get Genotype on the x is simple. I'll note in the code comments where and what change to make.

library(dplyr)
library(ggplot2)

df2 <- df %>%
  count(Study_Group, Genotypes) %>%
  group_by(Study_Group) %>% #change to `group_by(Genotypes) %>%` for alternative approach
  mutate(prop = n / sum(n))

ggplot(data = df2, aes(Study_Group, prop, fill = Genotypes)) + 
  geom_bar(stat = "identity", position = "dodge")

enter image description here

Upvotes: 1

Related Questions