Meleana
Meleana

Reputation: 3

Graphing frequency of categorical variable by month in ggplot2

Here is an example of the data I'm working with:

   SMUGGLING VIOLENCE AUXILIARY      Month
        Yes       No        No 2017-07-01
        Yes       No       Yes 2017-03-01
        Yes       No        No 2017-05-01
        Yes       No        No 2017-02-01
        Yes     <NA>      <NA> 2016-02-01

I am trying to graph the frequency of Smuggling=="Yes" over time (by month, 2016-2017). So I would just like the frequency of SMUGGLING on the y-axis (which I already subsetted to take out the No's), with the time by month on the x-axis.

Here is my code:

ggplot(data = smugglingyes,
   aes(Month, SMUGGLING)) +
  stat_summary(fun.y = sum, 
           geom = "line") +
  scale_x_date(date_labels="%Y-%m", date_breaks = "1 month")

This is just a rough example of the output I'm getting from ggplot2 (it will be cleaned up once I figure out the right way to graph this).

I'm very confused on if this is displaying the counts, as that's what I thought the stat_summary part of the code would do, but the "Yes" on the y-axis is misleading me and there aren't any numbers on the y-axis. Any idea how to fix this graph?

Upvotes: 0

Views: 1367

Answers (1)

erocoar
erocoar

Reputation: 5893

Why not aggregate it before passing to ggplot? E.g.

library(tidyverse)
df <- data.frame(Smuggling = "Yes", 
                 Violence = "No",
                 Auxiliary = c("No", "Yes", "No", "No", NA),
                 Month = c("2017-07-01", "2017-03-01", "2017-05-01", "2017-02-01", "2016-02-01"))

df %>% 
  mutate(Month = lubridate::ymd(Month)) %>% 
  count(Month) %>%
  ggplot(aes(x = Month, y = n)) + geom_line()

Upvotes: 1

Related Questions