Reputation: 283
I want to plot a histogram with ggplot of the counts of the variable. However, I want the bars to each show the relative fraction of a second (categorical) variable.
For example the sum of four variable is always 1. I want to plot a histogram based on the counts variable.
library(reshape)
library(ggplot2)
values= replicate(4, diff(c(0, sort(runif(92)), 1)))
colnames(values) = c("A","B","C","D")
counts = sample(1:100, 93, replace=T)
df = data.frame(cbind(values,"count"=counts))
mdf = melt(df,id="count")
ggplot(mdf, aes(count,fill=variable)) +
geom_histogram(alpha=0.3,
position="identity",lwd=0.2,binwidth=5,boundary=0)
I want each bars of historgram to be coloured based on the on the relative fraction of column(A,B,C,D). so each bin should have four categorical variables.
Upvotes: 0
Views: 6322
Reputation: 283
I found the answer with the help of others in this post. I want each bar of the plot as the fraction of the variables in (A,B,C,D).Though the code is not elegant. Might be helpful for someone !!
library(reshape2)
library(ggplot2)
library(dplyr)
##generate the random variables that sum to 1 for each columns
values <- matrix(runif(100*4),nrow=100)
S <- apply(values,1,sum); values = values/S
colnames(values) = c("A","B","C","D")
set.seed(2)
counts = sample(1:100, 100, replace=T)
##frequency of the data in binwidth of 5
table = hist(counts,breaks=seq(0, 100, by = 5),plot=F)$counts
##create a dataframe
df = data.frame(cbind(values,"count"=counts))
breaks = seq(5, 100, by = 5)
newdf = do.call("rbind",lapply(as.numeric(breaks), function(x) apply(df[which(df$count < x),][,1:4],2,sum)))
newdf = melt(sweep(newdf, 1, rowSums(newdf), FUN="/") * table)
colnames(newdf) = c("bins","variable","value")
ggplot(newdf) +
geom_bar(aes(x=bins, y=value, fill=variable), stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust=1))
Upvotes: 1
Reputation: 2403
I think this is what you want (I used dplyr package as well):
library(reshape2)
library(ggplot2)
library(dplyr)
set.seed(2)
values= replicate(4, diff(c(0, sort(runif(92)), 1)))
colnames(values) = c("A","B","C","D")
counts = sample(1:100, 93, replace=T)
df = data.frame(cbind(values,"count"=counts))
mdf = melt(df,id="count")
mdf = mdf %>%
mutate(binCounts = cut(count, breaks = seq(0, 100, by = 5))) %>%
group_by(binCounts) %>%
mutate(sumVal = sum(value)) %>%
ungroup() %>%
group_by(binCounts, variable) %>%
summarise(prct = sum(value)/mean(sumVal))
plot = ggplot(mdf) +
geom_bar(aes(x=binCounts, y=prct, fill=variable), stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust=1))
print(plot)
Upvotes: 1