nm44
nm44

Reputation: 45

geom_histogram to plot counts/accumulation of each x value and higher

I am trying to create a histogram/bar plot in R to show the counts of each x value I have in the dataset and higher. I am having trouble doing this, and I don't know if I use geom_histogram or geom_bar (I want to use ggplot2). To describe my problem further:

On the X axis I have "Percent_Origins," which is a column in my data frame. On my Y axis - for each of the Percent_Origin values I have occurring, I want the height of the bar to represent the count of rows with that percent value and higher. Right now, if I am to use a histogram, I have:

   plot <- ggplot(dataframe, aes(x=dataframe$Percent_Origins)) + 
  geom_histogram(aes(fill=Percent_Origins), binwidth= .05, colour="white")

What should I change the fill or general code to be to do what I want? That is, plot an accumulation of counts of each value and higher? Thanks!

Upvotes: 1

Views: 751

Answers (1)

Mark Peterson
Mark Peterson

Reputation: 9570

I think that your best bet is going to be creating the cumulative distribution function first then passing it to ggplot. There are several ways to do this, but a simple one (using dplyr) is to sort the data (in descending order), then just assign a count for each. Trim the data so that only the largest count is still included, then plot it.

To demonstrate, I am using the builtin iris data.

iris %>%
  arrange(desc(Sepal.Length)) %>%
  mutate(counts = 1:n()) %>%
  group_by(Sepal.Length) %>%
  slice(n()) %>%
  ggplot(aes(x = Sepal.Length, y = counts)) +
  geom_step(direction = "vh")

gives:

enter image description here

If you really want bars instead of a line, use geom_col instead. However, note that you either need to fill in gaps (to ensure the bars are evenly spaced across the range) or deal with breaks in the plot.

Upvotes: 1

Related Questions