Reputation: 45
I am trying to create a histogram/bar plot in R to show the counts of each x value I have in the dataset and higher. I am having trouble doing this, and I don't know if I use geom_histogram or geom_bar (I want to use ggplot2). To describe my problem further:
On the X axis I have "Percent_Origins," which is a column in my data frame. On my Y axis - for each of the Percent_Origin values I have occurring, I want the height of the bar to represent the count of rows with that percent value and higher. Right now, if I am to use a histogram, I have:
plot <- ggplot(dataframe, aes(x=dataframe$Percent_Origins)) +
geom_histogram(aes(fill=Percent_Origins), binwidth= .05, colour="white")
What should I change the fill or general code to be to do what I want? That is, plot an accumulation of counts of each value and higher? Thanks!
Upvotes: 1
Views: 751
Reputation: 9570
I think that your best bet is going to be creating the cumulative distribution function first then passing it to ggplot
. There are several ways to do this, but a simple one (using dplyr
) is to sort the data (in descending order), then just assign a count for each. Trim the data so that only the largest count is still included, then plot it.
To demonstrate, I am using the builtin iris
data.
iris %>%
arrange(desc(Sepal.Length)) %>%
mutate(counts = 1:n()) %>%
group_by(Sepal.Length) %>%
slice(n()) %>%
ggplot(aes(x = Sepal.Length, y = counts)) +
geom_step(direction = "vh")
gives:
If you really want bars instead of a line, use geom_col
instead. However, note that you either need to fill in gaps (to ensure the bars are evenly spaced across the range) or deal with breaks in the plot.
Upvotes: 1