user3978632
user3978632

Reputation: 283

Stacked histogram plot in R

I want to plot a histogram with ggplot of the counts of the variable. However, I want the bars to each show the relative fraction of a second (categorical) variable.

For example the sum of four variable is always 1. I want to plot a histogram based on the counts variable.

library(reshape)
library(ggplot2)

values= replicate(4, diff(c(0, sort(runif(92)), 1)))
 colnames(values) = c("A","B","C","D")
 counts = sample(1:100, 93, replace=T)
 df = data.frame(cbind(values,"count"=counts))
 mdf = melt(df,id="count")



ggplot(mdf, aes(count,fill=variable)) +
  geom_histogram(alpha=0.3, 
   position="identity",lwd=0.2,binwidth=5,boundary=0)

I want each bars of historgram to be coloured based on the on the relative fraction of column(A,B,C,D). so each bin should have four categorical variables.

Upvotes: 0

Views: 6322

Answers (2)

user3978632
user3978632

Reputation: 283

I found the answer with the help of others in this post. I want each bar of the plot as the fraction of the variables in (A,B,C,D).Though the code is not elegant. Might be helpful for someone !! enter image description here

library(reshape2)
library(ggplot2)
library(dplyr)

##generate the random variables that sum to 1 for each columns
values <- matrix(runif(100*4),nrow=100) 
S <- apply(values,1,sum); values = values/S 
colnames(values) = c("A","B","C","D")
set.seed(2)
counts = sample(1:100, 100, replace=T)

##frequency of the data in binwidth of 5
table = hist(counts,breaks=seq(0, 100, by = 5),plot=F)$counts

##create a dataframe
df = data.frame(cbind(values,"count"=counts))


breaks = seq(5, 100, by = 5)
newdf = do.call("rbind",lapply(as.numeric(breaks), function(x) apply(df[which(df$count < x),][,1:4],2,sum)))
newdf = melt(sweep(newdf, 1, rowSums(newdf), FUN="/") * table)
colnames(newdf) = c("bins","variable","value")
ggplot(newdf) +
  geom_bar(aes(x=bins, y=value, fill=variable), stat="identity") +
  theme(axis.text.x=element_text(angle = 90, hjust=1))

Upvotes: 1

LetEpsilonBeLessThanZero
LetEpsilonBeLessThanZero

Reputation: 2403

I think this is what you want (I used dplyr package as well):

library(reshape2)
library(ggplot2)
library(dplyr)

set.seed(2)
values= replicate(4, diff(c(0, sort(runif(92)), 1)))
colnames(values) = c("A","B","C","D")
counts = sample(1:100, 93, replace=T)
df = data.frame(cbind(values,"count"=counts))
mdf = melt(df,id="count")

mdf = mdf %>%
  mutate(binCounts = cut(count, breaks = seq(0, 100, by = 5))) %>%
  group_by(binCounts) %>%
  mutate(sumVal = sum(value)) %>%
  ungroup() %>%
  group_by(binCounts, variable) %>%
  summarise(prct = sum(value)/mean(sumVal))

plot = ggplot(mdf) +
  geom_bar(aes(x=binCounts, y=prct, fill=variable), stat="identity") +
  theme(axis.text.x=element_text(angle = 90, hjust=1))

print(plot)

enter image description here

Upvotes: 1

Related Questions