PiecesOfMagics
PiecesOfMagics

Reputation: 79

R ggplot histogram bars in descending order

I don't get how to make the bars of an histogram to appears in descending order with ggplot.

Heres my code with a dataframe that everyone can use :

library(ggplot2)
library(scales)


chol <- read.table(url("http://assets.datacamp.com/blog_assets/chol.txt"), 
header = TRUE)
ggplot(chol) +
geom_histogram(aes(x = AGE, y = ..ncount.., fill = ..ncount..),
               breaks=seq(20, 50, by = 2),
               col="red",
               alpha = .2) +
scale_fill_gradient("Percentage", low = "green", high = "red") +
scale_y_continuous(labels = percent_format()) +
labs(title="Histogram for Age") +
labs(x="Age", y="Percentage")

The resulting histogram that i want in descending order :

enter image description here

I tried to order the column AGE before plotting :

## set the levels in order we want
Chol <- within(Chol, 
               AGE <- factor(AGE, 
                                  levels=names(sort(table(AGE), 
                                                    decreasing=TRUE)

I get an error when i plot the order AGE with ggplot and geom_histogram.

Upvotes: 3

Views: 18187

Answers (2)

Andrew Jackson
Andrew Jackson

Reputation: 823

While I wouldn't recommend this because it shuffles the x-axis ages, you can split the data up into new groups based on the age (using the cut function), reorder the resulting factor by frequency and then plot it as a bar chart:

#Add a new column for the "bins"
chol <- chol %>% mutate(AGE2 = cut(chol$AGE,
                           breaks = seq(min(AGE), max(AGE), by = 2),
                           right = FALSE))

#Reorders the factor by count
chol$AGE3 <- reorder(chol$AGE2, chol$AGE, FUN = function(x) 100-length(x))

#Makes the chart
chol %>% filter(AGE >= 20 & AGE < 50) %>% #This and the cut replace breaks
ggplot() +
  geom_bar(aes(x = AGE3,
               y = ..count../max(..count..), #Gives same percents on y-axis
               fill = ..count..), #Gives same percents on the scale
               col = "red",
               alpha = .2) +
  scale_fill_gradient("Percentage", low = "green", high = "red") + 
  scale_y_continuous(labels = percent_format()) +
  labs(title = "Histogram for Age") +
  labs(x = "Age", y = "Percentage")

example output plot The y-axis percents don't make sense on this because some group is 100% - 100% of what?

Also, you still need to relabel the groups. [20,22) means that it includes values greater than or equal to 20 and less than 22 (see Interval Notation Wikipedia Page).

Upvotes: 0

MrFlick
MrFlick

Reputation: 206616

First I've gotta say I think this can potentially be a very confusing plot if you are shuffling the x-axis; I think most people would assume that ages are sorte in increasing order.

But if this is really what you want to do, geom_histogram() really isn't going to help here. Better to do the data summary yourself and just use ggplot for plotting. Here's one way to generate the data for your plot

# helper function
pairjoin <- function(x) paste(head(x,-1), tail(x,-1), sep="-")
# use the base hist() function to calculate BINs
dd <- with(hist(chol$AGE, breaks=seq(10, 60, by = 5), plot=FALSE), data.frame(N=counts, age=pairjoin(breaks), PCT=counts/sum(counts)))

Now with the data we need, we can draw the plot

ggplot(dd) +
geom_bar(aes(reorder(age, -PCT), PCT, fill=PCT),
    col="red", alpha = .2, stat="identity") +
scale_fill_gradient("Percentage", low = "green", high = "red") +
scale_y_continuous(labels = percent_format()) +
labs(title="Histogram for Age") +
labs(x="Age", y="Percentage")

This will make the following plot:

enter image description here

Upvotes: 4

Related Questions