Reputation: 9
I want to create density graphs like this in R using ggplot with data.-
Data is like this:
Year Month Crop Incidence
2015 Jan Maize 6.00
2015 Feb Maize 10.00
2015 March Maize 5.00
2015 May Maize 5.00
2015 June Maize 7.00
2015 July Maize 8.00
2015 Aug Maize 15.00
2015 Sep Maize 11.00
2015 Oct Maize 14.00
2015 Nov Maize 7.00
2015 Dec Maize 6.00
2016 Jan Maize 8.00
2016 Feb Maize 10.00
2016 March Maize 11.00
2016 May Maize 5.00
2016 June Maize 7.00
2016 July Maize 9.00
2016 Aug Maize 12.00
2016 Sep Maize 12.00
2016 Oct Maize 13.00
2016 Nov Maize 7.00
2016 Dec Maize 6.00
2017 Jan Maize 5.00
2017 Feb Maize 8.00
2017 March Maize 12.00
2017 May Maize 3.00
2017 June Maize 7.00
2017 July Maize 8.00
2017 Aug Maize 9.00
2017 Sep Maize 11.00
2017 Oct Maize 14.00
2017 Nov Maize 9.00
2017 Dec Maize 7.00
2018 Jan Maize 7.00
2018 Feb Maize 11.00
2018 March Maize 13.00
2018 May Maize 2.00
2018 June Maize 7.00
2018 July Maize 8.00
2018 Aug Maize 15.00
2018 Sep Maize 10.00
2018 Oct Maize 12.00
2018 Nov Maize 8.00
2018 Dec Maize 6.00
2019 Jan Maize 8.00
2019 Feb Maize 8.00
2019 March Maize 13.00
2019 May Maize 4.00
2019 June Maize 9.00
2019 July Maize 10.00
2019 Aug Maize 11.00
2019 Sep Maize 10.00
2019 Oct Maize 14.00
2019 Nov Maize 7.00
2019 Dec Maize 7.00
2015 Jan Sorghum 7.00
2015 Feb Sorghum 8.00
2015 March Sorghum 13.00
2015 May Sorghum 3.00
2015 June Sorghum 7.00
2015 July Sorghum 8.00
2015 Aug Sorghum 14.00
2015 Sep Sorghum 12.00
2015 Oct Sorghum 13.00
2015 Nov Sorghum 9.00
2015 Dec Sorghum 7.00
2016 Jan Sorghum 7.00
2016 Feb Sorghum 11.00
2016 March Sorghum 12.00
2016 May Sorghum 4.00
2016 June Sorghum 8.00
2016 July Sorghum 8.00
2016 Aug Sorghum 11.00
2016 Sep Sorghum 12.00
2016 Oct Sorghum 13.00
2016 Nov Sorghum 8.00
2016 Dec Sorghum 7.00
2017 Jan Sorghum 7.00
2017 Feb Sorghum 8.00
2017 March Sorghum 12.00
2017 May Sorghum 2.00
2017 June Sorghum 7.00
2017 July Sorghum 9.00
2017 Aug Sorghum 11.00
2017 Sep Sorghum 12.00
2017 Oct Sorghum 14.00
2017 Nov Sorghum 9.00
2017 Dec Sorghum 6.00
2018 Jan Sorghum 7.00
2018 Feb Sorghum 7.00
2018 March Sorghum 11.00
2018 May Sorghum 3.00
2018 June Sorghum 7.00
2018 July Sorghum 9.00
2018 Aug Sorghum 10.00
2018 Sep Sorghum 10.00
2018 Oct Sorghum 12.00
2018 Nov Sorghum 7.00
2018 Dec Sorghum 7.00
2019 Jan Sorghum 5.00
2019 Feb Sorghum 8.00
2019 March Sorghum 13.00
2019 May Sorghum 5.00
2019 June Sorghum 7.00
2019 July Sorghum 9.00
2019 Aug Sorghum 15.00
2019 Sep Sorghum 10.00
2019 Oct Sorghum 12.00
2019 Nov Sorghum 8.00
2019 Dec Sorghum 7.00
2015 Jan Pearl millet 7.00
2015 Feb Pearl millet 8.00
2015 March Pearl millet 12.00
2015 May Pearl millet 4.00
2015 June Pearl millet 9.00
2015 July Pearl millet 10.00
2015 Aug Pearl millet 13.00
2015 Sep Pearl millet 10.00
2015 Oct Pearl millet 14.00
2015 Nov Pearl millet 7.00
2015 Dec Pearl millet 6.00
2016 Jan Pearl millet 8.00
2016 Feb Pearl millet 8.00
2016 March Pearl millet 12.00
2016 May Pearl millet 2.00
2016 June Pearl millet 8.00
2016 July Pearl millet 10.00
2016 Aug Pearl millet 13.00
2016 Sep Pearl millet 11.00
2016 Oct Pearl millet 14.00
2016 Nov Pearl millet 8.00
2016 Dec Pearl millet 7.00
2017 Jan Pearl millet 5.00
2017 Feb Pearl millet 11.00
2017 March Pearl millet 13.00
2017 May Pearl millet 3.00
2017 June Pearl millet 9.00
2017 July Pearl millet 10.00
2017 Aug Pearl millet 10.00
2017 Sep Pearl millet 10.00
2017 Oct Pearl millet 13.00
2017 Nov Pearl millet 8.00
2017 Dec Pearl millet 6.00
2018 Jan Pearl millet 8.00
2018 Feb Pearl millet 11.00
2018 March Pearl millet 13.00
2018 May Pearl millet 5.00
2018 June Pearl millet 7.00
2018 July Pearl millet 9.00
2018 Aug Pearl millet 11.00
2018 Sep Pearl millet 11.00
2018 Oct Pearl millet 14.00
2018 Nov Pearl millet 9.00
2018 Dec Pearl millet 6.00
2019 Jan Pearl millet 7.00
2019 Feb Pearl millet 11.00
2019 March Pearl millet 12.00
2019 May Pearl millet 5.00
2019 June Pearl millet 7.00
2019 July Pearl millet 9.00
2019 Aug Pearl millet 14.00
2019 Sep Pearl millet 11.00
2019 Oct Pearl millet 14.00
2019 Nov Pearl millet 7.00
2019 Dec Pearl millet 7.00
2015 Jan Finger Millet 5.00
2015 Feb Finger Millet 11.00
2015 March Finger Millet 12.00
2015 May Finger Millet 5.00
2015 June Finger Millet 7.00
2015 July Finger Millet 9.00
2015 Aug Finger Millet 12.00
2015 Sep Finger Millet 10.00
2015 Oct Finger Millet 12.00
2015 Nov Finger Millet 8.00
2015 Dec Finger Millet 6.00
2016 Jan Finger Millet 5.00
2016 Feb Finger Millet 8.00
2016 March Finger Millet 12.00
2016 May Finger Millet 4.00
2016 June Finger Millet 9.00
2016 July Finger Millet 9.00
2016 Aug Finger Millet 12.00
2016 Sep Finger Millet 10.00
2016 Oct Finger Millet 14.00
2016 Nov Finger Millet 7.00
2016 Dec Finger Millet 7.00
2017 Jan Finger Millet 6.00
2017 Feb Finger Millet 7.00
2017 March Finger Millet 11.00
2017 May Finger Millet 2.00
2017 June Finger Millet 7.00
2017 July Finger Millet 10.00
2017 Aug Finger Millet 11.00
2017 Sep Finger Millet 12.00
2017 Oct Finger Millet 14.00
2017 Nov Finger Millet 8.00
2017 Dec Finger Millet 7.00
2018 Jan Finger Millet 8.00
2018 Feb Finger Millet 10.00
2018 March Finger Millet 12.00
2018 May Finger Millet 4.00
2018 June Finger Millet 7.00
2018 July Finger Millet 10.00
2018 Aug Finger Millet 10.00
2018 Sep Finger Millet 12.00
2018 Oct Finger Millet 12.00
2018 Nov Finger Millet 9.00
2018 Dec Finger Millet 7.00
2019 Jan Finger Millet 6.00
2019 Feb Finger Millet 7.00
2019 March Finger Millet 13.00
2019 May Finger Millet 4.00
2019 June Finger Millet 9.00
2019 July Finger Millet 9.00
2019 Aug Finger Millet 15.00
2019 Sep Finger Millet 12.00
2019 Oct Finger Millet 13.00
2019 Nov Finger Millet 9.00
2019 Dec Finger Millet 6.00
I am unable to mark the x-axis with Months and Y-axis with Incidence. Instead, I am getting density on the y-axis and intervals on x-axis. Is it possible to draw graphs like that with the above data?
P.S: Please forgive me. I just copy-pasted data from excel. Unable to vectorize it.
Upvotes: 0
Views: 107
Reputation: 13883
I think first it's worthwhile to clear up a bit of confusion on the plot you are requesting.
A density plot is very similar to a histogram, which gives you information on the distribution of a numeric variable. Given your dataset, either a histogram or a density plot would be able to show, for example, generally how the Incidence
is distributed when we consider the number you get monthly. You would be able to clearly visualize if one crop varied a lot (wide histogram) vs. another one which varied a little (narrow histogram). You would also get a picture of how different the average value per month was for each Crop
.
From your description and example plot you share - this is not actually what you want. You are asking how we can demonstrate how the Incidence
if each type of Crop
varies over time. There are some options for how to show this, but the example you give is a variation of a line plot that is sometimes called an area plot. It seems that the example you show is a stacked variation of those plots, where the y axis shows Inicidence
and the x axis is showing time, but the actual placement of the line on the y axis is a result of totalling the Crop
values - in other words "stacking" the values on top of one another. As an example, if we knew that we had 3 million Groundnuts and 4 million Chickpeas grown in one given month, a stacked plot would have a total height of 7 million for that month, 3 million of which were identified as Groundnut and 4 million identified as Chickpea. If it were not stacked, you would have one point at 3 million for Groundnut and one point at 4 million for Chickpea.
The designation of line plot vs. area plot basically amounts to: is the area below the line filled in? If you shade in the area below the line, you might call this an "area plot", but without that, you would just say it's a line plot.
Now that you understand the designations, let's go ahead and make a stacked area plot. Your x axis is a date (month), but when the data is read into R, it will be identified as a character vector. The first step is to convert that column into a Date
format so that it can be mapped correctly. ggplot2
needs to know this particular column should be treated as a continuous variable (like a date or numeric), rather than a discrete variable (like a character vector or factor). I'm using lubridate
here to reclassify the column as a Date
type, but you could also use as.POSIXct()
or as.POSIXlt()
or another date / datetime format. All use the convention of format=
that is associated with srptime()
, so if you want to know what %Y
and %b
are referring to, check the information included here.
One more point here is that dates need year, month, and day. So... that's why I'm just treating all dates as if they happen on the first of each month. :/ Not really a better way to do that IMO.
library(lubridate)
library(ggplot2)
df <- # your dataset
df$date <- paste(df$Year, df$Month, 1)
df$date <- as_date(df$date, format='%Y %b %d')
For the plot, we'll map df$date
along the x axis and df$Incidence
along the y axis. Then I'm using geom_area()
to show the filled in line plot. Be default, geom_area()
resorts to "stacking", but I'm explicitly state position='stack'
to make that point clear here for you. I'm mapping df$Crop
to both the color
and fill
aesthetics, since I want to see the actual lines of the plot as well as the fill. In order to visually differentiate the fill color and the lines, I'm setting the alpha
a bit lower.
Finally, for the scale on the x axis, here's where lubridate
really gets useful. I'm using the handy date_labels
and date_format
helper functions there inside scale_x_date()
to set how I want the labels on the axis to look. You can do a lot here, so it depends on what you want. If it was my own plot, I might have played a bit with some fancy formatting of this axis text... but this is sufficient for general purposes.
Oh, and the expand=
arguments on both scale_
functions are there to control the spacing of the distance from the edge of the plot to the start of the data.
ggplot(df, aes(date, Incidence)) +
geom_area(aes(fill=Crop, color=Crop), position='stack', alpha=0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.05))) +
scale_x_date(date_labels = '%b\n%Y', date_breaks = '6 month', expand = expansion(mult=0)) +
theme_light()
That's the basic plot, but I'm not crazy happy with the labeling on the x axis. I see from your comments on the question that OP maybe is having most trouble setting the text/formatting on the x axis. Here's an example of things that you could do to make something really nice.
For one, I'd like to separate out each year, and then finally I may want to change some of the aspects of the formatting for the month text. To do this, I'm going to use faceting to separate out the years (by using facet_wrap()
on Year
). I'm going to force everything to be one row via nrow=1
and then also force the spacing to fit to the data using scales='free_x'
. Then I'm using a combination of strip.position
and theme()
elements to fix the position and formatting of the strip text (facet text), settings for the axis text, and highlight some spacing between the facets. Importantly, I'm using panel.spacing.x
to set the space between each facet to be 0 - thereby making no space between years and make it look like it's all one plot... when in reality it's actually a faceted plot, with each facet being one year.
Have fun messing with whatever format you want, but this is what I came up with:
ggplot(df, aes(date, Incidence)) +
geom_area(aes(fill=Crop, color=Crop), position='stack', alpha=0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.05))) +
scale_x_date(date_labels = '%b', date_breaks = '3 month', expand = expansion(mult=0)) +
facet_wrap(~Year, scale='free_x', nrow=1, strip.position = 'bottom') +
theme_light() +
theme(
panel.spacing.x = unit(0,'pt'),
axis.text.x = element_text(angle=45, hjust=1, vjust=1),
strip.placement = 'outside',
strip.text = element_text(color='black', face='bold', size=13),
strip.background = element_blank(),
panel.border = element_rect(color='gray40')
)
It's not perfect - note that the points between the facets have different spacing, so it looks weird. Probably a way to fix that, but you probably get the idea.
Upvotes: 1
Reputation: 23807
You could uncount
first and then use y = after_stat(count)
. The problem is that you might expect that the density plot resembles a histogram - this is not necessarily the case, because this curve represents a smoothed estimate which is hugely dependent on the bandwidth.
library(tidyverse)
df_uncount <-
df %>%
mutate(date = lubridate::ym(paste(Year, Month))) %>%
uncount(weights = Incidence)
ggplot(df_uncount, aes(x = date, color = Crop)) +
geom_density(aes(y= after_stat(count))) +
facet_grid(~Crop)
That this actually seems to work, is better seen on summarised data with less data points:
df_years <-
df %>%
group_by(Crop, Year) %>%
summarise(Incidence = sum(Incidence))
#> `summarise()` has grouped output by 'Crop'. You can override using the `.groups` argument.
df_years
#> # A tibble: 20 x 3
#> # Groups: Crop [4]
#> Crop Year Incidence
#> <chr> <int> <dbl>
#> 1 Finger_millet 2015 97
#> 2 Finger_millet 2016 97
#> 3 Finger_millet 2017 95
#> 4 Finger_millet 2018 101
#> 5 Finger_millet 2019 103
#> 6 Maize 2015 94
#> 7 Maize 2016 100
#> 8 Maize 2017 93
#> 9 Maize 2018 99
#> 10 Maize 2019 101
#> 11 Pearl_millet 2015 100
#> 12 Pearl_millet 2016 101
#> 13 Pearl_millet 2017 98
#> 14 Pearl_millet 2018 104
#> 15 Pearl_millet 2019 104
#> 16 Sorghum 2015 101
#> 17 Sorghum 2016 101
#> 18 Sorghum 2017 97
#> 19 Sorghum 2018 90
#> 20 Sorghum 2019 99
df_years %>%
uncount(weights = Incidence) %>%
ggplot(aes(x = Year, color = Crop, group = Crop)) +
geom_density(aes(y = after_stat(count))) +
facet_grid(~Crop)
data
df <- read.table(header = T, text = "Year Month Crop Incidence
2015 Jan Maize 6.00
2015 Feb Maize 10.00
2015 March Maize 5.00
2015 May Maize 5.00
2015 June Maize 7.00
2015 July Maize 8.00
2015 Aug Maize 15.00
2015 Sep Maize 11.00
2015 Oct Maize 14.00
2015 Nov Maize 7.00
2015 Dec Maize 6.00
2016 Jan Maize 8.00
2016 Feb Maize 10.00
2016 March Maize 11.00
2016 May Maize 5.00
2016 June Maize 7.00
2016 July Maize 9.00
2016 Aug Maize 12.00
2016 Sep Maize 12.00
2016 Oct Maize 13.00
2016 Nov Maize 7.00
2016 Dec Maize 6.00
2017 Jan Maize 5.00
2017 Feb Maize 8.00
2017 March Maize 12.00
2017 May Maize 3.00
2017 June Maize 7.00
2017 July Maize 8.00
2017 Aug Maize 9.00
2017 Sep Maize 11.00
2017 Oct Maize 14.00
2017 Nov Maize 9.00
2017 Dec Maize 7.00
2018 Jan Maize 7.00
2018 Feb Maize 11.00
2018 March Maize 13.00
2018 May Maize 2.00
2018 June Maize 7.00
2018 July Maize 8.00
2018 Aug Maize 15.00
2018 Sep Maize 10.00
2018 Oct Maize 12.00
2018 Nov Maize 8.00
2018 Dec Maize 6.00
2019 Jan Maize 8.00
2019 Feb Maize 8.00
2019 March Maize 13.00
2019 May Maize 4.00
2019 June Maize 9.00
2019 July Maize 10.00
2019 Aug Maize 11.00
2019 Sep Maize 10.00
2019 Oct Maize 14.00
2019 Nov Maize 7.00
2019 Dec Maize 7.00
2015 Jan Sorghum 7.00
2015 Feb Sorghum 8.00
2015 March Sorghum 13.00
2015 May Sorghum 3.00
2015 June Sorghum 7.00
2015 July Sorghum 8.00
2015 Aug Sorghum 14.00
2015 Sep Sorghum 12.00
2015 Oct Sorghum 13.00
2015 Nov Sorghum 9.00
2015 Dec Sorghum 7.00
2016 Jan Sorghum 7.00
2016 Feb Sorghum 11.00
2016 March Sorghum 12.00
2016 May Sorghum 4.00
2016 June Sorghum 8.00
2016 July Sorghum 8.00
2016 Aug Sorghum 11.00
2016 Sep Sorghum 12.00
2016 Oct Sorghum 13.00
2016 Nov Sorghum 8.00
2016 Dec Sorghum 7.00
2017 Jan Sorghum 7.00
2017 Feb Sorghum 8.00
2017 March Sorghum 12.00
2017 May Sorghum 2.00
2017 June Sorghum 7.00
2017 July Sorghum 9.00
2017 Aug Sorghum 11.00
2017 Sep Sorghum 12.00
2017 Oct Sorghum 14.00
2017 Nov Sorghum 9.00
2017 Dec Sorghum 6.00
2018 Jan Sorghum 7.00
2018 Feb Sorghum 7.00
2018 March Sorghum 11.00
2018 May Sorghum 3.00
2018 June Sorghum 7.00
2018 July Sorghum 9.00
2018 Aug Sorghum 10.00
2018 Sep Sorghum 10.00
2018 Oct Sorghum 12.00
2018 Nov Sorghum 7.00
2018 Dec Sorghum 7.00
2019 Jan Sorghum 5.00
2019 Feb Sorghum 8.00
2019 March Sorghum 13.00
2019 May Sorghum 5.00
2019 June Sorghum 7.00
2019 July Sorghum 9.00
2019 Aug Sorghum 15.00
2019 Sep Sorghum 10.00
2019 Oct Sorghum 12.00
2019 Nov Sorghum 8.00
2019 Dec Sorghum 7.00
2015 Jan Pearl_millet 7.00
2015 Feb Pearl_millet 8.00
2015 March Pearl_millet 12.00
2015 May Pearl_millet 4.00
2015 June Pearl_millet 9.00
2015 July Pearl_millet 10.00
2015 Aug Pearl_millet 13.00
2015 Sep Pearl_millet 10.00
2015 Oct Pearl_millet 14.00
2015 Nov Pearl_millet 7.00
2015 Dec Pearl_millet 6.00
2016 Jan Pearl_millet 8.00
2016 Feb Pearl_millet 8.00
2016 March Pearl_millet 12.00
2016 May Pearl_millet 2.00
2016 June Pearl_millet 8.00
2016 July Pearl_millet 10.00
2016 Aug Pearl_millet 13.00
2016 Sep Pearl_millet 11.00
2016 Oct Pearl_millet 14.00
2016 Nov Pearl_millet 8.00
2016 Dec Pearl_millet 7.00
2017 Jan Pearl_millet 5.00
2017 Feb Pearl_millet 11.00
2017 March Pearl_millet 13.00
2017 May Pearl_millet 3.00
2017 June Pearl_millet 9.00
2017 July Pearl_millet 10.00
2017 Aug Pearl_millet 10.00
2017 Sep Pearl_millet 10.00
2017 Oct Pearl_millet 13.00
2017 Nov Pearl_millet 8.00
2017 Dec Pearl_millet 6.00
2018 Jan Pearl_millet 8.00
2018 Feb Pearl_millet 11.00
2018 March Pearl_millet 13.00
2018 May Pearl_millet 5.00
2018 June Pearl_millet 7.00
2018 July Pearl_millet 9.00
2018 Aug Pearl_millet 11.00
2018 Sep Pearl_millet 11.00
2018 Oct Pearl_millet 14.00
2018 Nov Pearl_millet 9.00
2018 Dec Pearl_millet 6.00
2019 Jan Pearl_millet 7.00
2019 Feb Pearl_millet 11.00
2019 March Pearl_millet 12.00
2019 May Pearl_millet 5.00
2019 June Pearl_millet 7.00
2019 July Pearl_millet 9.00
2019 Aug Pearl_millet 14.00
2019 Sep Pearl_millet 11.00
2019 Oct Pearl_millet 14.00
2019 Nov Pearl_millet 7.00
2019 Dec Pearl_millet 7.00
2015 Jan Finger_millet 5.00
2015 Feb Finger_millet 11.00
2015 March Finger_millet 12.00
2015 May Finger_millet 5.00
2015 June Finger_millet 7.00
2015 July Finger_millet 9.00
2015 Aug Finger_millet 12.00
2015 Sep Finger_millet 10.00
2015 Oct Finger_millet 12.00
2015 Nov Finger_millet 8.00
2015 Dec Finger_millet 6.00
2016 Jan Finger_millet 5.00
2016 Feb Finger_millet 8.00
2016 March Finger_millet 12.00
2016 May Finger_millet 4.00
2016 June Finger_millet 9.00
2016 July Finger_millet 9.00
2016 Aug Finger_millet 12.00
2016 Sep Finger_millet 10.00
2016 Oct Finger_millet 14.00
2016 Nov Finger_millet 7.00
2016 Dec Finger_millet 7.00
2017 Jan Finger_millet 6.00
2017 Feb Finger_millet 7.00
2017 March Finger_millet 11.00
2017 May Finger_millet 2.00
2017 June Finger_millet 7.00
2017 July Finger_millet 10.00
2017 Aug Finger_millet 11.00
2017 Sep Finger_millet 12.00
2017 Oct Finger_millet 14.00
2017 Nov Finger_millet 8.00
2017 Dec Finger_millet 7.00
2018 Jan Finger_millet 8.00
2018 Feb Finger_millet 10.00
2018 March Finger_millet 12.00
2018 May Finger_millet 4.00
2018 June Finger_millet 7.00
2018 July Finger_millet 10.00
2018 Aug Finger_millet 10.00
2018 Sep Finger_millet 12.00
2018 Oct Finger_millet 12.00
2018 Nov Finger_millet 9.00
2018 Dec Finger_millet 7.00
2019 Jan Finger_millet 6.00
2019 Feb Finger_millet 7.00
2019 March Finger_millet 13.00
2019 May Finger_millet 4.00
2019 June Finger_millet 9.00
2019 July Finger_millet 9.00
2019 Aug Finger_millet 15.00
2019 Sep Finger_millet 12.00
2019 Oct Finger_millet 13.00
2019 Nov Finger_millet 9.00
2019 Dec Finger_millet 6.00")
Upvotes: 0