Reputation: 623
I'm trying to display an "absence/presence" heatmap with geom_tile in R. I would like to have a fill for "1" or "present" if a feature (here: OTU) can be found in at least one of the samples within a group. So below is the example code, where I grouped the samples by site:
library(reshape2)
library(ggplot2)
df <- data.frame(
OTU = c("OTU001", "OTU002", "OTU003", "OTU004", "OTU005"),
Sample1 = c(0,0,1,1,0),
Sample2 = c(1,0,0,1,0),
Sample3 = c(1,1,0,1,0),
Sample4 = c(1,1,1,1,0))
molten_df <- melt(df)
# add group data
sites <- data.frame(
site = c(rep("site_A", 10), rep("site_B", 10)))
molten_df2 <- cbind(molten_df, sites)
# plot heatmap based on group variable sites
ggplot(molten_df2, aes(x = site, y = OTU, fill = value)) +
geom_tile()
the tile (site_A, OTU003) consists of the values Sample1 = 1 and Sample2 = 0 and the outcome is 0. On the other hand, the tile (site_B, OTU003) also has Sample3 = 0 and Sample4 = 1, but it turns out as 1. Maybe it uses the last value for the fill? As I would like to display 1 if an OTU appears in any of the grouped samples regardless of the order, I wondered if anyone knows how to do this within ggplot2?
The other way I thought of (but failed coding) is to write a function that sets the remaining values of a given tile to 1, if at least one 1 appears.
Upvotes: 0
Views: 953
Reputation: 254
With library dplyr
, you can create a new variable indicating if OTU at a given site is present in, at least, one sample :
tmp = group_by(molten_df2,OTU, site) %>%
summarise(., PA=as.factor(ifelse(sum(value)>0,1,0)))
Then plot :
ggplot(tmp, aes(x = site, y = OTU, fill = PA)) +
geom_tile()
Or directly inside the ggplot function :
ggplot(group_by(molten_df2,OTU, site) %>%
summarise(., PA=factor(ifelse(sum(value)>0,1,0))),
aes(x = site, y = OTU, fill =PA)) +
geom_tile()
Upvotes: 1