Reputation: 1573
I would like to summarize my "karyotype" molecular data by location and substrate (see sample data below) as percentages in order to create a stack-bar plot in ggplot2.
I have figured out how to use 'dcast' to get a total for each karyotype, but cannot figure out how to get a percent for each of the three karyotypes (i.e. 'BB', 'BD', 'DD').
The data should be in a format to make a stacked bar plot in 'ggplot2'.
Sample Data:
library(reshape2)
Karotype.Data <- structure(list(Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), .Label = c("Kampinge", "Kaseberga", "Molle", "Steninge"
), class = "factor"), Substrate = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
2L, 2L, 2L, 2L, 2L), .Label = c("Kampinge", "Kaseberga", "Molle",
"Steninge"), class = "factor"), Karyotype = structure(c(1L, 3L,
4L, 4L, 3L, 3L, 4L, 4L, 4L, 3L, 1L, 4L, 3L, 4L, 4L, 3L, 1L, 4L,
3L, 3L, 4L, 3L, 4L, 3L, 3L), .Label = c("", "BB", "BD", "DD"), class = "factor")), .Names = c("Location",
"Substrate", "Karyotype"), row.names = c(135L, 136L, 137L, 138L,
139L, 165L, 166L, 167L, 168L, 169L, 236L, 237L, 238L, 239L, 240L,
326L, 327L, 328L, 329L, 330L, 426L, 427L, 428L, 429L, 430L), class = "data.frame")
## Summary count for each karoytype ##
Karyotype.Summary <- dcast(Karotype.Data , Location + Substrate ~ Karyotype, value.var="Karyotype", length)
Upvotes: 3
Views: 462
Reputation: 1573
With some help from 'Marat Talipov' and many other answers to questions on Stackoverflow I found out that it is important to load 'plyr' before 'dplyr' and to use 'summarise' rather than 'summarize'. Then removing the missing data was the last step using 'filter'.
library(dplyr)
z.counts <- Karotype.Data %>%
group_by(Location,Substrate,Karyotype) %>%
summarise(freq=n())
z.freq <- z.counts %>% filter(Karyotype != '') %>%
group_by(Location,Substrate) %>%
mutate(freq=freq/sum(freq))
z.freq
library (ggplot2)
ggplot(z.freq, aes(x=Substrate, y=freq, fill=Karyotype)) +
geom_bar(stat="identity") +
facet_wrap(~ Location)
Now I have created the plot I was looking for:
Upvotes: 0
Reputation: 13304
You can use the dplyr
package:
library(dplyr)
z.counts <- Karotype.Data %>%
group_by(Location,Substrate,Karyotype) %>%
summarize(freq=n())
z.freq <- z.counts %>%
group_by(Location,Substrate) %>%
mutate(freq=freq/sum(freq)*100)
Here, the data remain in the long format, so it is straightforward to build the barplot with ggplot
:
library(ggplot2)
ggplot(z.freq) +
aes(x=Karyotype,y=freq) +
facet_grid(Location~Substrate) +
geom_bar(stat='identity')
Upvotes: 1