Reputation: 79
I don't get how to make the bars of an histogram to appears in descending order with ggplot.
Heres my code with a dataframe that everyone can use :
library(ggplot2)
library(scales)
chol <- read.table(url("http://assets.datacamp.com/blog_assets/chol.txt"),
header = TRUE)
ggplot(chol) +
geom_histogram(aes(x = AGE, y = ..ncount.., fill = ..ncount..),
breaks=seq(20, 50, by = 2),
col="red",
alpha = .2) +
scale_fill_gradient("Percentage", low = "green", high = "red") +
scale_y_continuous(labels = percent_format()) +
labs(title="Histogram for Age") +
labs(x="Age", y="Percentage")
The resulting histogram that i want in descending order :
I tried to order the column AGE before plotting :
## set the levels in order we want
Chol <- within(Chol,
AGE <- factor(AGE,
levels=names(sort(table(AGE),
decreasing=TRUE)
I get an error when i plot the order AGE with ggplot and geom_histogram.
Upvotes: 3
Views: 18187
Reputation: 823
While I wouldn't recommend this because it shuffles the x-axis ages, you can split the data up into new groups based on the age (using the cut
function), reorder the resulting factor by frequency and then plot it as a bar chart:
#Add a new column for the "bins"
chol <- chol %>% mutate(AGE2 = cut(chol$AGE,
breaks = seq(min(AGE), max(AGE), by = 2),
right = FALSE))
#Reorders the factor by count
chol$AGE3 <- reorder(chol$AGE2, chol$AGE, FUN = function(x) 100-length(x))
#Makes the chart
chol %>% filter(AGE >= 20 & AGE < 50) %>% #This and the cut replace breaks
ggplot() +
geom_bar(aes(x = AGE3,
y = ..count../max(..count..), #Gives same percents on y-axis
fill = ..count..), #Gives same percents on the scale
col = "red",
alpha = .2) +
scale_fill_gradient("Percentage", low = "green", high = "red") +
scale_y_continuous(labels = percent_format()) +
labs(title = "Histogram for Age") +
labs(x = "Age", y = "Percentage")
The y-axis percents don't make sense on this because some group is 100% - 100% of what?
Also, you still need to relabel the groups. [20,22) means that it includes values greater than or equal to 20 and less than 22 (see Interval Notation Wikipedia Page).
Upvotes: 0
Reputation: 206616
First I've gotta say I think this can potentially be a very confusing plot if you are shuffling the x-axis; I think most people would assume that ages are sorte in increasing order.
But if this is really what you want to do, geom_histogram()
really isn't going to help here. Better to do the data summary yourself and just use ggplot for plotting. Here's one way to generate the data for your plot
# helper function
pairjoin <- function(x) paste(head(x,-1), tail(x,-1), sep="-")
# use the base hist() function to calculate BINs
dd <- with(hist(chol$AGE, breaks=seq(10, 60, by = 5), plot=FALSE), data.frame(N=counts, age=pairjoin(breaks), PCT=counts/sum(counts)))
Now with the data we need, we can draw the plot
ggplot(dd) +
geom_bar(aes(reorder(age, -PCT), PCT, fill=PCT),
col="red", alpha = .2, stat="identity") +
scale_fill_gradient("Percentage", low = "green", high = "red") +
scale_y_continuous(labels = percent_format()) +
labs(title="Histogram for Age") +
labs(x="Age", y="Percentage")
This will make the following plot:
Upvotes: 4