ggplot2 - create stacked histogram of proportions for indiciduals, and seperate them by population

Question

Essentially, I have a dataset in which I have 4 columns containing the following information: individuals ("Ind"), the geographic population to which those individuals belong ("Pop"), the proportion of their genome that belongs to cluster1 and the proportion of their genome that belongs to cluster2 (these last two add up to 1).

Example:

    Ind <- c(1:20)
    Pop <- rep(1:2, each = 10)
    set.seed(234)
    Cluster1 <- runif(20, 0.0, 1.0)
    Cluster2 <- 1-Cluster1
    df <- data.frame(Ind, Pop, Cluster1, Cluster2)

Data:

    Ind Pop    Cluster1   Cluster2
 1    1   1 0.745619998 0.25438000
 2    2   1 0.781712425 0.21828758
 3    3   1 0.020037114 0.97996289
 4    4   1 0.776085387 0.22391461
 5    5   1 0.066910093 0.93308991
 6    6   1 0.644795124 0.35520488
 7    7   1 0.929385959 0.07061404
 8    8   1 0.717642189 0.28235781
 9    9   1 0.927736510 0.07226349
 10  10   1 0.284230120 0.71576988
 11  11   2 0.555724930 0.44427507
 12  12   2 0.547701653 0.45229835
 13  13   2 0.582847855 0.41715215
 14  14   2 0.582989913 0.41701009
 15  15   2 0.001198341 0.99880166
 16  16   2 0.441117854 0.55888215
 17  17   2 0.313152501 0.68684750
 18  18   2 0.740014466 0.25998553
 19  19   2 0.138326844 0.86167316
 20  20   2 0.871777777 0.12822222

I want to try and produce a plot using ggplot2 that resembles the "A" panel in this figure. In this figure, each individual is a bar with the proportion of each cluster, but the x ticks are the populations and the vertical grids separate these populations. I know that I can easily produce a stacked histogram if I ignore Pop and use melt(). But I would like to know how to incorporate Pop to produce elegant an elegant plot such as the one in the link above.

Thanks!

sebkopf · Accepted Answer

How about melting with both Ind and Pop as id variables and graphing it with a facet_grid? It's not 100% like the plot you were looking for but gets pretty close with a few theme adjustments:

dfm <- melt(df, id = c("Ind", "Pop"))
ggplot(dfm, aes(Ind, value, fill = variable)) + 
    geom_bar(stat="identity", width = 1) + 
    facet_grid(~Pop, scales = "free_x") + 
    scale_y_continuous(name = "", expand = c(0, 0)) + 
    scale_x_continuous(name = "", expand = c(0, 0), breaks = dfm$Ind) + 
    theme(
        panel.border = element_rect(colour = "black", size = 1, fill = NA),
        strip.background = element_rect(colour = "black", size = 1),
        panel.margin = unit(0, "cm"),
        axis.text.x = element_blank()
    )

ggplot example

UPDATE: my example fails to cover the more complex case of multiple populations with uneven numbers of individuals. Quick amendment to deal with this case using the spaces = "free_x" attribute, complete code for example:

require(ggplot2)
require(reshape2)
require(grid)

Ind <- c(1:30)
Pop <- rep(paste("Pop", 1:3), times = c(5, 15, 10))
set.seed(234)
Cluster1 <- runif(30, 0.0, 1.0)
Cluster2 <- 1-Cluster1
df <- data.frame(Ind, Pop, Cluster1, Cluster2)

dfm <- melt(df, id = c("Ind", "Pop"))
ggplot(dfm, aes(Ind, value, fill = variable)) + 
    geom_bar(stat="identity", width = 1) + 
    facet_grid(~Pop, scales = "free_x", space = "free_x") + 
    scale_y_continuous(name = "", expand = c(0, 0)) + 
    scale_x_continuous(name = "", expand = c(0, 0), breaks = dfm$Ind) + 
    theme(
        panel.border = element_rect(colour = "black", size = 1, fill = NA),
        strip.background = element_rect(colour = "black", size = 1),
        panel.margin = unit(0, "cm"),
        axis.text.x = element_blank()
    )

ggplot example2

ggplot2 - create stacked histogram of proportions for indiciduals, and seperate them by population

Answers (1)

Related Questions