Steven
Steven

Reputation: 83

Graph proportion by category in ggplot2

I am trying to graph the proportion of people in Remission (which is binary 0/1) after treatment by year. I can find how to graph the count, but I would like the proportion as there are a different number of people each year.

My data look something like this:

Client_id Year Remission
2 2016 0
4 2017 1
7 2017 0
8 2016 1
12 2016 1

I would like to create a plot with Year on the x-axis and the proportion of those in remission on the y-axis. Ideally, I would be able to do this both using geom_bar and geom_line.

I have tried this code, but it gives a proportion of 1.00 for every year, which is not correct.

ggplot(data=df)+
  geom_bar(aes(x=Year,y=Remission),stat="identity",position="dodge")

I could calculate this manually for each year and create a table using Excel, but hoping for a way to complete it in ggplot2.

Upvotes: 0

Views: 521

Answers (1)

Quinten
Quinten

Reputation: 41603

You could use position = "fill" in your geom_bar and use fill = Remission in your ggplot aesthetics like this:

library(dplyr)
library(ggplot2)
df %>%
  mutate(Year = as.character(Year),
         Remission = as.factor(Remission)) %>%
  ggplot(aes(x=Year, fill = Remission)) +
  geom_bar(position = "fill") +
  labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2

Percentage scale

If you want a percentage scale, you can use the package scales with function percent_format() in scale_y_continuous like this:

library(dplyr)
library(ggplot2)
library(scales)
df %>%
  mutate(Year = as.character(Year),
         Remission = as.factor(Remission)) %>%
  ggplot(aes(x=Year, fill = Remission)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels=percent_format()) +
  labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2

Proportion with geom_line

You can do this by first calculating the proportion using count and group_by with a mutate and plot the data like this:

library(dplyr)
library(ggplot2)
df %>%
  mutate(Year = as.numeric(Year),
         Remission = as.factor(Remission)) %>%
  count(Year, Remission) %>%
  group_by(Year) %>%
  mutate(prop = n/sum(n)) %>%
  ungroup() %>%
  ggplot(aes(x=Year, y = prop, color = Remission)) +
  geom_line() +
  scale_x_continuous(breaks = c(2016,2017)) +
  labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2

Upvotes: 1

Related Questions