Vinod
Vinod

Reputation: 9

How to create density graphs / curves on counted data?

I want to create density graphs like this in R using ggplot with data.-

![enter image description here] 1

Data is like this:

Year    Month   Crop    Incidence
2015    Jan Maize   6.00
2015    Feb Maize   10.00
2015    March   Maize   5.00
2015    May Maize   5.00
2015    June    Maize   7.00
2015    July    Maize   8.00
2015    Aug Maize   15.00
2015    Sep Maize   11.00
2015    Oct Maize   14.00
2015    Nov Maize   7.00
2015    Dec Maize   6.00
2016    Jan Maize   8.00
2016    Feb Maize   10.00
2016    March   Maize   11.00
2016    May Maize   5.00
2016    June    Maize   7.00
2016    July    Maize   9.00
2016    Aug Maize   12.00
2016    Sep Maize   12.00
2016    Oct Maize   13.00
2016    Nov Maize   7.00
2016    Dec Maize   6.00
2017    Jan Maize   5.00
2017    Feb Maize   8.00
2017    March   Maize   12.00
2017    May Maize   3.00
2017    June    Maize   7.00
2017    July    Maize   8.00
2017    Aug Maize   9.00
2017    Sep Maize   11.00
2017    Oct Maize   14.00
2017    Nov Maize   9.00
2017    Dec Maize   7.00
2018    Jan Maize   7.00
2018    Feb Maize   11.00
2018    March   Maize   13.00
2018    May Maize   2.00
2018    June    Maize   7.00
2018    July    Maize   8.00
2018    Aug Maize   15.00
2018    Sep Maize   10.00
2018    Oct Maize   12.00
2018    Nov Maize   8.00
2018    Dec Maize   6.00
2019    Jan Maize   8.00
2019    Feb Maize   8.00
2019    March   Maize   13.00
2019    May Maize   4.00
2019    June    Maize   9.00
2019    July    Maize   10.00
2019    Aug Maize   11.00
2019    Sep Maize   10.00
2019    Oct Maize   14.00
2019    Nov Maize   7.00
2019    Dec Maize   7.00
2015    Jan Sorghum 7.00
2015    Feb Sorghum 8.00
2015    March   Sorghum 13.00
2015    May Sorghum 3.00
2015    June    Sorghum 7.00
2015    July    Sorghum 8.00
2015    Aug Sorghum 14.00
2015    Sep Sorghum 12.00
2015    Oct Sorghum 13.00
2015    Nov Sorghum 9.00
2015    Dec Sorghum 7.00
2016    Jan Sorghum 7.00
2016    Feb Sorghum 11.00
2016    March   Sorghum 12.00
2016    May Sorghum 4.00
2016    June    Sorghum 8.00
2016    July    Sorghum 8.00
2016    Aug Sorghum 11.00
2016    Sep Sorghum 12.00
2016    Oct Sorghum 13.00
2016    Nov Sorghum 8.00
2016    Dec Sorghum 7.00
2017    Jan Sorghum 7.00
2017    Feb Sorghum 8.00
2017    March   Sorghum 12.00
2017    May Sorghum 2.00
2017    June    Sorghum 7.00
2017    July    Sorghum 9.00
2017    Aug Sorghum 11.00
2017    Sep Sorghum 12.00
2017    Oct Sorghum 14.00
2017    Nov Sorghum 9.00
2017    Dec Sorghum 6.00
2018    Jan Sorghum 7.00
2018    Feb Sorghum 7.00
2018    March   Sorghum 11.00
2018    May Sorghum 3.00
2018    June    Sorghum 7.00
2018    July    Sorghum 9.00
2018    Aug Sorghum 10.00
2018    Sep Sorghum 10.00
2018    Oct Sorghum 12.00
2018    Nov Sorghum 7.00
2018    Dec Sorghum 7.00
2019    Jan Sorghum 5.00
2019    Feb Sorghum 8.00
2019    March   Sorghum 13.00
2019    May Sorghum 5.00
2019    June    Sorghum 7.00
2019    July    Sorghum 9.00
2019    Aug Sorghum 15.00
2019    Sep Sorghum 10.00
2019    Oct Sorghum 12.00
2019    Nov Sorghum 8.00
2019    Dec Sorghum 7.00
2015    Jan Pearl millet    7.00
2015    Feb Pearl millet    8.00
2015    March   Pearl millet    12.00
2015    May Pearl millet    4.00
2015    June    Pearl millet    9.00
2015    July    Pearl millet    10.00
2015    Aug Pearl millet    13.00
2015    Sep Pearl millet    10.00
2015    Oct Pearl millet    14.00
2015    Nov Pearl millet    7.00
2015    Dec Pearl millet    6.00
2016    Jan Pearl millet    8.00
2016    Feb Pearl millet    8.00
2016    March   Pearl millet    12.00
2016    May Pearl millet    2.00
2016    June    Pearl millet    8.00
2016    July    Pearl millet    10.00
2016    Aug Pearl millet    13.00
2016    Sep Pearl millet    11.00
2016    Oct Pearl millet    14.00
2016    Nov Pearl millet    8.00
2016    Dec Pearl millet    7.00
2017    Jan Pearl millet    5.00
2017    Feb Pearl millet    11.00
2017    March   Pearl millet    13.00
2017    May Pearl millet    3.00
2017    June    Pearl millet    9.00
2017    July    Pearl millet    10.00
2017    Aug Pearl millet    10.00
2017    Sep Pearl millet    10.00
2017    Oct Pearl millet    13.00
2017    Nov Pearl millet    8.00
2017    Dec Pearl millet    6.00
2018    Jan Pearl millet    8.00
2018    Feb Pearl millet    11.00
2018    March   Pearl millet    13.00
2018    May Pearl millet    5.00
2018    June    Pearl millet    7.00
2018    July    Pearl millet    9.00
2018    Aug Pearl millet    11.00
2018    Sep Pearl millet    11.00
2018    Oct Pearl millet    14.00
2018    Nov Pearl millet    9.00
2018    Dec Pearl millet    6.00
2019    Jan Pearl millet    7.00
2019    Feb Pearl millet    11.00
2019    March   Pearl millet    12.00
2019    May Pearl millet    5.00
2019    June    Pearl millet    7.00
2019    July    Pearl millet    9.00
2019    Aug Pearl millet    14.00
2019    Sep Pearl millet    11.00
2019    Oct Pearl millet    14.00
2019    Nov Pearl millet    7.00
2019    Dec Pearl millet    7.00
2015    Jan Finger Millet   5.00
2015    Feb Finger Millet   11.00
2015    March   Finger Millet   12.00
2015    May Finger Millet   5.00
2015    June    Finger Millet   7.00
2015    July    Finger Millet   9.00
2015    Aug Finger Millet   12.00
2015    Sep Finger Millet   10.00
2015    Oct Finger Millet   12.00
2015    Nov Finger Millet   8.00
2015    Dec Finger Millet   6.00
2016    Jan Finger Millet   5.00
2016    Feb Finger Millet   8.00
2016    March   Finger Millet   12.00
2016    May Finger Millet   4.00
2016    June    Finger Millet   9.00
2016    July    Finger Millet   9.00
2016    Aug Finger Millet   12.00
2016    Sep Finger Millet   10.00
2016    Oct Finger Millet   14.00
2016    Nov Finger Millet   7.00
2016    Dec Finger Millet   7.00
2017    Jan Finger Millet   6.00
2017    Feb Finger Millet   7.00
2017    March   Finger Millet   11.00
2017    May Finger Millet   2.00
2017    June    Finger Millet   7.00
2017    July    Finger Millet   10.00
2017    Aug Finger Millet   11.00
2017    Sep Finger Millet   12.00
2017    Oct Finger Millet   14.00
2017    Nov Finger Millet   8.00
2017    Dec Finger Millet   7.00
2018    Jan Finger Millet   8.00
2018    Feb Finger Millet   10.00
2018    March   Finger Millet   12.00
2018    May Finger Millet   4.00
2018    June    Finger Millet   7.00
2018    July    Finger Millet   10.00
2018    Aug Finger Millet   10.00
2018    Sep Finger Millet   12.00
2018    Oct Finger Millet   12.00
2018    Nov Finger Millet   9.00
2018    Dec Finger Millet   7.00
2019    Jan Finger Millet   6.00
2019    Feb Finger Millet   7.00
2019    March   Finger Millet   13.00
2019    May Finger Millet   4.00
2019    June    Finger Millet   9.00
2019    July    Finger Millet   9.00
2019    Aug Finger Millet   15.00
2019    Sep Finger Millet   12.00
2019    Oct Finger Millet   13.00
2019    Nov Finger Millet   9.00
2019    Dec Finger Millet   6.00

I am unable to mark the x-axis with Months and Y-axis with Incidence. Instead, I am getting density on the y-axis and intervals on x-axis. Is it possible to draw graphs like that with the above data?

P.S: Please forgive me. I just copy-pasted data from excel. Unable to vectorize it.

Upvotes: 0

Views: 107

Answers (2)

chemdork123
chemdork123

Reputation: 13883

I think first it's worthwhile to clear up a bit of confusion on the plot you are requesting.

Density Plots, Histograms, and a Stacked Line/Area Plot

A density plot is very similar to a histogram, which gives you information on the distribution of a numeric variable. Given your dataset, either a histogram or a density plot would be able to show, for example, generally how the Incidence is distributed when we consider the number you get monthly. You would be able to clearly visualize if one crop varied a lot (wide histogram) vs. another one which varied a little (narrow histogram). You would also get a picture of how different the average value per month was for each Crop.

From your description and example plot you share - this is not actually what you want. You are asking how we can demonstrate how the Incidence if each type of Crop varies over time. There are some options for how to show this, but the example you give is a variation of a line plot that is sometimes called an area plot. It seems that the example you show is a stacked variation of those plots, where the y axis shows Inicidence and the x axis is showing time, but the actual placement of the line on the y axis is a result of totalling the Crop values - in other words "stacking" the values on top of one another. As an example, if we knew that we had 3 million Groundnuts and 4 million Chickpeas grown in one given month, a stacked plot would have a total height of 7 million for that month, 3 million of which were identified as Groundnut and 4 million identified as Chickpea. If it were not stacked, you would have one point at 3 million for Groundnut and one point at 4 million for Chickpea.

The designation of line plot vs. area plot basically amounts to: is the area below the line filled in? If you shade in the area below the line, you might call this an "area plot", but without that, you would just say it's a line plot.

The plot

Now that you understand the designations, let's go ahead and make a stacked area plot. Your x axis is a date (month), but when the data is read into R, it will be identified as a character vector. The first step is to convert that column into a Date format so that it can be mapped correctly. ggplot2 needs to know this particular column should be treated as a continuous variable (like a date or numeric), rather than a discrete variable (like a character vector or factor). I'm using lubridate here to reclassify the column as a Date type, but you could also use as.POSIXct() or as.POSIXlt() or another date / datetime format. All use the convention of format= that is associated with srptime(), so if you want to know what %Y and %b are referring to, check the information included here.

One more point here is that dates need year, month, and day. So... that's why I'm just treating all dates as if they happen on the first of each month. :/ Not really a better way to do that IMO.

library(lubridate)
library(ggplot2)

df <- # your dataset
df$date <- paste(df$Year, df$Month, 1)
df$date <- as_date(df$date, format='%Y %b %d')

For the plot, we'll map df$date along the x axis and df$Incidence along the y axis. Then I'm using geom_area() to show the filled in line plot. Be default, geom_area() resorts to "stacking", but I'm explicitly state position='stack' to make that point clear here for you. I'm mapping df$Crop to both the color and fill aesthetics, since I want to see the actual lines of the plot as well as the fill. In order to visually differentiate the fill color and the lines, I'm setting the alpha a bit lower.

Finally, for the scale on the x axis, here's where lubridate really gets useful. I'm using the handy date_labels and date_format helper functions there inside scale_x_date() to set how I want the labels on the axis to look. You can do a lot here, so it depends on what you want. If it was my own plot, I might have played a bit with some fancy formatting of this axis text... but this is sufficient for general purposes.

Oh, and the expand= arguments on both scale_ functions are there to control the spacing of the distance from the edge of the plot to the start of the data.

ggplot(df, aes(date, Incidence)) +
  geom_area(aes(fill=Crop, color=Crop), position='stack', alpha=0.2) +
  scale_y_continuous(expand=expansion(mult=c(0,0.05))) +
  scale_x_date(date_labels = '%b\n%Y', date_breaks = '6 month', expand = expansion(mult=0)) +
  theme_light()

enter image description here

Improving the Plot

That's the basic plot, but I'm not crazy happy with the labeling on the x axis. I see from your comments on the question that OP maybe is having most trouble setting the text/formatting on the x axis. Here's an example of things that you could do to make something really nice.

For one, I'd like to separate out each year, and then finally I may want to change some of the aspects of the formatting for the month text. To do this, I'm going to use faceting to separate out the years (by using facet_wrap() on Year). I'm going to force everything to be one row via nrow=1 and then also force the spacing to fit to the data using scales='free_x'. Then I'm using a combination of strip.position and theme() elements to fix the position and formatting of the strip text (facet text), settings for the axis text, and highlight some spacing between the facets. Importantly, I'm using panel.spacing.x to set the space between each facet to be 0 - thereby making no space between years and make it look like it's all one plot... when in reality it's actually a faceted plot, with each facet being one year.

Have fun messing with whatever format you want, but this is what I came up with:

ggplot(df, aes(date, Incidence)) +
  geom_area(aes(fill=Crop, color=Crop), position='stack', alpha=0.2) +
  scale_y_continuous(expand=expansion(mult=c(0,0.05))) +
  scale_x_date(date_labels = '%b', date_breaks = '3 month', expand = expansion(mult=0)) +
  facet_wrap(~Year, scale='free_x', nrow=1, strip.position = 'bottom') +
  theme_light() +
  theme(
    panel.spacing.x = unit(0,'pt'),
    axis.text.x = element_text(angle=45, hjust=1, vjust=1),
    strip.placement = 'outside',
    strip.text = element_text(color='black', face='bold', size=13),
    strip.background = element_blank(),
    panel.border = element_rect(color='gray40')
  )

enter image description here

It's not perfect - note that the points between the facets have different spacing, so it looks weird. Probably a way to fix that, but you probably get the idea.

Upvotes: 1

tjebo
tjebo

Reputation: 23807

You could uncount first and then use y = after_stat(count). The problem is that you might expect that the density plot resembles a histogram - this is not necessarily the case, because this curve represents a smoothed estimate which is hugely dependent on the bandwidth.

library(tidyverse)

df_uncount <- 
df %>%
  mutate(date = lubridate::ym(paste(Year, Month))) %>% 
  uncount(weights = Incidence) 

ggplot(df_uncount, aes(x = date, color = Crop)) +
  geom_density(aes(y= after_stat(count))) +
  facet_grid(~Crop)

That this actually seems to work, is better seen on summarised data with less data points:


df_years <- 
  df %>%
  group_by(Crop, Year) %>%
  summarise(Incidence = sum(Incidence)) 
#> `summarise()` has grouped output by 'Crop'. You can override using the `.groups` argument.

df_years
#> # A tibble: 20 x 3
#> # Groups:   Crop [4]
#> Crop           Year Incidence
#> <chr>         <int>     <dbl>
#>   1 Finger_millet  2015        97
#> 2 Finger_millet  2016        97
#> 3 Finger_millet  2017        95
#> 4 Finger_millet  2018       101
#> 5 Finger_millet  2019       103
#> 6 Maize          2015        94
#> 7 Maize          2016       100
#> 8 Maize          2017        93
#> 9 Maize          2018        99
#> 10 Maize          2019       101
#> 11 Pearl_millet   2015       100
#> 12 Pearl_millet   2016       101
#> 13 Pearl_millet   2017        98
#> 14 Pearl_millet   2018       104
#> 15 Pearl_millet   2019       104
#> 16 Sorghum        2015       101
#> 17 Sorghum        2016       101
#> 18 Sorghum        2017        97
#> 19 Sorghum        2018        90
#> 20 Sorghum        2019        99


df_years %>%
  uncount(weights = Incidence)  %>%
  ggplot(aes(x = Year, color = Crop, group = Crop)) +
  geom_density(aes(y = after_stat(count))) +
  facet_grid(~Crop)

data

df <- read.table(header = T, text = "Year   Month   Crop    Incidence
2015    Jan Maize   6.00
2015    Feb Maize   10.00
2015    March   Maize   5.00
2015    May Maize   5.00
2015    June    Maize   7.00
2015    July    Maize   8.00
2015    Aug Maize   15.00
2015    Sep Maize   11.00
2015    Oct Maize   14.00
2015    Nov Maize   7.00
2015    Dec Maize   6.00
2016    Jan Maize   8.00
2016    Feb Maize   10.00
2016    March   Maize   11.00
2016    May Maize   5.00
2016    June    Maize   7.00
2016    July    Maize   9.00
2016    Aug Maize   12.00
2016    Sep Maize   12.00
2016    Oct Maize   13.00
2016    Nov Maize   7.00
2016    Dec Maize   6.00
2017    Jan Maize   5.00
2017    Feb Maize   8.00
2017    March   Maize   12.00
2017    May Maize   3.00
2017    June    Maize   7.00
2017    July    Maize   8.00
2017    Aug Maize   9.00
2017    Sep Maize   11.00
2017    Oct Maize   14.00
2017    Nov Maize   9.00
2017    Dec Maize   7.00
2018    Jan Maize   7.00
2018    Feb Maize   11.00
2018    March   Maize   13.00
2018    May Maize   2.00
2018    June    Maize   7.00
2018    July    Maize   8.00
2018    Aug Maize   15.00
2018    Sep Maize   10.00
2018    Oct Maize   12.00
2018    Nov Maize   8.00
2018    Dec Maize   6.00
2019    Jan Maize   8.00
2019    Feb Maize   8.00
2019    March   Maize   13.00
2019    May Maize   4.00
2019    June    Maize   9.00
2019    July    Maize   10.00
2019    Aug Maize   11.00
2019    Sep Maize   10.00
2019    Oct Maize   14.00
2019    Nov Maize   7.00
2019    Dec Maize   7.00
2015    Jan Sorghum 7.00
2015    Feb Sorghum 8.00
2015    March   Sorghum 13.00
2015    May Sorghum 3.00
2015    June    Sorghum 7.00
2015    July    Sorghum 8.00
2015    Aug Sorghum 14.00
2015    Sep Sorghum 12.00
2015    Oct Sorghum 13.00
2015    Nov Sorghum 9.00
2015    Dec Sorghum 7.00
2016    Jan Sorghum 7.00
2016    Feb Sorghum 11.00
2016    March   Sorghum 12.00
2016    May Sorghum 4.00
2016    June    Sorghum 8.00
2016    July    Sorghum 8.00
2016    Aug Sorghum 11.00
2016    Sep Sorghum 12.00
2016    Oct Sorghum 13.00
2016    Nov Sorghum 8.00
2016    Dec Sorghum 7.00
2017    Jan Sorghum 7.00
2017    Feb Sorghum 8.00
2017    March   Sorghum 12.00
2017    May Sorghum 2.00
2017    June    Sorghum 7.00
2017    July    Sorghum 9.00
2017    Aug Sorghum 11.00
2017    Sep Sorghum 12.00
2017    Oct Sorghum 14.00
2017    Nov Sorghum 9.00
2017    Dec Sorghum 6.00
2018    Jan Sorghum 7.00
2018    Feb Sorghum 7.00
2018    March   Sorghum 11.00
2018    May Sorghum 3.00
2018    June    Sorghum 7.00
2018    July    Sorghum 9.00
2018    Aug Sorghum 10.00
2018    Sep Sorghum 10.00
2018    Oct Sorghum 12.00
2018    Nov Sorghum 7.00
2018    Dec Sorghum 7.00
2019    Jan Sorghum 5.00
2019    Feb Sorghum 8.00
2019    March   Sorghum 13.00
2019    May Sorghum 5.00
2019    June    Sorghum 7.00
2019    July    Sorghum 9.00
2019    Aug Sorghum 15.00
2019    Sep Sorghum 10.00
2019    Oct Sorghum 12.00
2019    Nov Sorghum 8.00
2019    Dec Sorghum 7.00
2015    Jan Pearl_millet    7.00
2015    Feb Pearl_millet    8.00
2015    March   Pearl_millet    12.00
2015    May Pearl_millet    4.00
2015    June    Pearl_millet    9.00
2015    July    Pearl_millet    10.00
2015    Aug Pearl_millet    13.00
2015    Sep Pearl_millet    10.00
2015    Oct Pearl_millet    14.00
2015    Nov Pearl_millet    7.00
2015    Dec Pearl_millet    6.00
2016    Jan Pearl_millet    8.00
2016    Feb Pearl_millet    8.00
2016    March   Pearl_millet    12.00
2016    May Pearl_millet    2.00
2016    June    Pearl_millet    8.00
2016    July    Pearl_millet    10.00
2016    Aug Pearl_millet    13.00
2016    Sep Pearl_millet    11.00
2016    Oct Pearl_millet    14.00
2016    Nov Pearl_millet    8.00
2016    Dec Pearl_millet    7.00
2017    Jan Pearl_millet    5.00
2017    Feb Pearl_millet    11.00
2017    March   Pearl_millet    13.00
2017    May Pearl_millet    3.00
2017    June    Pearl_millet    9.00
2017    July    Pearl_millet    10.00
2017    Aug Pearl_millet    10.00
2017    Sep Pearl_millet    10.00
2017    Oct Pearl_millet    13.00
2017    Nov Pearl_millet    8.00
2017    Dec Pearl_millet    6.00
2018    Jan Pearl_millet    8.00
2018    Feb Pearl_millet    11.00
2018    March   Pearl_millet    13.00
2018    May Pearl_millet    5.00
2018    June    Pearl_millet    7.00
2018    July    Pearl_millet    9.00
2018    Aug Pearl_millet    11.00
2018    Sep Pearl_millet    11.00
2018    Oct Pearl_millet    14.00
2018    Nov Pearl_millet    9.00
2018    Dec Pearl_millet    6.00
2019    Jan Pearl_millet    7.00
2019    Feb Pearl_millet    11.00
2019    March   Pearl_millet    12.00
2019    May Pearl_millet    5.00
2019    June    Pearl_millet    7.00
2019    July    Pearl_millet    9.00
2019    Aug Pearl_millet    14.00
2019    Sep Pearl_millet    11.00
2019    Oct Pearl_millet    14.00
2019    Nov Pearl_millet    7.00
2019    Dec Pearl_millet    7.00
2015    Jan Finger_millet   5.00
2015    Feb Finger_millet   11.00
2015    March   Finger_millet   12.00
2015    May Finger_millet   5.00
2015    June    Finger_millet   7.00
2015    July    Finger_millet   9.00
2015    Aug Finger_millet   12.00
2015    Sep Finger_millet   10.00
2015    Oct Finger_millet   12.00
2015    Nov Finger_millet   8.00
2015    Dec Finger_millet   6.00
2016    Jan Finger_millet   5.00
2016    Feb Finger_millet   8.00
2016    March   Finger_millet   12.00
2016    May Finger_millet   4.00
2016    June    Finger_millet   9.00
2016    July    Finger_millet   9.00
2016    Aug Finger_millet   12.00
2016    Sep Finger_millet   10.00
2016    Oct Finger_millet   14.00
2016    Nov Finger_millet   7.00
2016    Dec Finger_millet   7.00
2017    Jan Finger_millet   6.00
2017    Feb Finger_millet   7.00
2017    March   Finger_millet   11.00
2017    May Finger_millet   2.00
2017    June    Finger_millet   7.00
2017    July    Finger_millet   10.00
2017    Aug Finger_millet   11.00
2017    Sep Finger_millet   12.00
2017    Oct Finger_millet   14.00
2017    Nov Finger_millet   8.00
2017    Dec Finger_millet   7.00
2018    Jan Finger_millet   8.00
2018    Feb Finger_millet   10.00
2018    March   Finger_millet   12.00
2018    May Finger_millet   4.00
2018    June    Finger_millet   7.00
2018    July    Finger_millet   10.00
2018    Aug Finger_millet   10.00
2018    Sep Finger_millet   12.00
2018    Oct Finger_millet   12.00
2018    Nov Finger_millet   9.00
2018    Dec Finger_millet   7.00
2019    Jan Finger_millet   6.00
2019    Feb Finger_millet   7.00
2019    March   Finger_millet   13.00
2019    May Finger_millet   4.00
2019    June    Finger_millet   9.00
2019    July    Finger_millet   9.00
2019    Aug Finger_millet   15.00
2019    Sep Finger_millet   12.00
2019    Oct Finger_millet   13.00
2019    Nov Finger_millet   9.00
2019    Dec Finger_millet   6.00")

Upvotes: 0

Related Questions