Reputation: 127
Essentially, I have a dataset in which I have 4 columns containing the following information: individuals ("Ind"), the geographic population to which those individuals belong ("Pop"), the proportion of their genome that belongs to cluster1 and the proportion of their genome that belongs to cluster2 (these last two add up to 1).
Example:
Ind <- c(1:20)
Pop <- rep(1:2, each = 10)
set.seed(234)
Cluster1 <- runif(20, 0.0, 1.0)
Cluster2 <- 1-Cluster1
df <- data.frame(Ind, Pop, Cluster1, Cluster2)
Data:
Ind Pop Cluster1 Cluster2
1 1 1 0.745619998 0.25438000
2 2 1 0.781712425 0.21828758
3 3 1 0.020037114 0.97996289
4 4 1 0.776085387 0.22391461
5 5 1 0.066910093 0.93308991
6 6 1 0.644795124 0.35520488
7 7 1 0.929385959 0.07061404
8 8 1 0.717642189 0.28235781
9 9 1 0.927736510 0.07226349
10 10 1 0.284230120 0.71576988
11 11 2 0.555724930 0.44427507
12 12 2 0.547701653 0.45229835
13 13 2 0.582847855 0.41715215
14 14 2 0.582989913 0.41701009
15 15 2 0.001198341 0.99880166
16 16 2 0.441117854 0.55888215
17 17 2 0.313152501 0.68684750
18 18 2 0.740014466 0.25998553
19 19 2 0.138326844 0.86167316
20 20 2 0.871777777 0.12822222
I want to try and produce a plot using ggplot2
that resembles the "A" panel in this figure. In this figure, each individual is a bar with the proportion of each cluster, but the x ticks are the populations and the vertical grids separate these populations. I know that I can easily produce a stacked histogram if I ignore Pop
and use melt()
. But I would like to know how to incorporate Pop
to produce elegant an elegant plot such as the one in the link above.
Thanks!
Upvotes: 1
Views: 267
Reputation: 2375
How about melting with both Ind
and Pop
as id variables and graphing it with a facet_grid
? It's not 100% like the plot you were looking for but gets pretty close with a few theme adjustments:
dfm <- melt(df, id = c("Ind", "Pop"))
ggplot(dfm, aes(Ind, value, fill = variable)) +
geom_bar(stat="identity", width = 1) +
facet_grid(~Pop, scales = "free_x") +
scale_y_continuous(name = "", expand = c(0, 0)) +
scale_x_continuous(name = "", expand = c(0, 0), breaks = dfm$Ind) +
theme(
panel.border = element_rect(colour = "black", size = 1, fill = NA),
strip.background = element_rect(colour = "black", size = 1),
panel.margin = unit(0, "cm"),
axis.text.x = element_blank()
)
UPDATE: my example fails to cover the more complex case of multiple populations with uneven numbers of individuals. Quick amendment to deal with this case using the spaces = "free_x"
attribute, complete code for example:
require(ggplot2)
require(reshape2)
require(grid)
Ind <- c(1:30)
Pop <- rep(paste("Pop", 1:3), times = c(5, 15, 10))
set.seed(234)
Cluster1 <- runif(30, 0.0, 1.0)
Cluster2 <- 1-Cluster1
df <- data.frame(Ind, Pop, Cluster1, Cluster2)
dfm <- melt(df, id = c("Ind", "Pop"))
ggplot(dfm, aes(Ind, value, fill = variable)) +
geom_bar(stat="identity", width = 1) +
facet_grid(~Pop, scales = "free_x", space = "free_x") +
scale_y_continuous(name = "", expand = c(0, 0)) +
scale_x_continuous(name = "", expand = c(0, 0), breaks = dfm$Ind) +
theme(
panel.border = element_rect(colour = "black", size = 1, fill = NA),
strip.background = element_rect(colour = "black", size = 1),
panel.margin = unit(0, "cm"),
axis.text.x = element_blank()
)
Upvotes: 1