Reputation: 1599
I want to create a line chart in ggplot2
with 350 beer breweries. I want to count per year how many active breweries there are. I only have the start and end date of brewery activity. tidyverse
answers prefered.
begin_datum_jaar
is year the brewery started. eind_datum_jaar
is in which year the brewery has ended.
example data frame:
library(tidyverse)
# A tibble: 4 x 3
brouwerijnaam begin_datum_jaar eind_datum_jaar
<chr> <int> <int>
1 Brand 1340 2019
2 Heineken 1592 2019
3 Grolsche 1615 2019
4 Bavaria 1719 2010
dput:
df <- structure(list(brouwerijnaam = c("Brand", "Heineken", "Grolsche",
"Bavaria"), begin_datum_jaar = c(1340L, 1592L, 1615L, 1719L),
eind_datum_jaar = c(2019L, 2019L, 2019L, 2010L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
Desired output where etc.
is a placeholder.
# A tibble: 13 x 2
year n
<chr> <dbl>
1 1340 1
2 1341 1
3 1342 1
4 1343 1
5 etc. 1
6 1592 2
7 1593 2
8 etc. 2
9 1625 3
10 1626 3
11 1627 3
12 1628 3
13 etc. 3
Upvotes: 1
Views: 73
Reputation: 14774
Could try:
library(tidyverse)
df %>%
rowwise %>%
do(data.frame(brouwerij = .$brouwerijnaam,
Year = seq(.$begin_datum_jaar, .$eind_datum_jaar, by = 1))) %>%
count(Year, name = "Active breweries") %>%
ggplot(aes(x = Year, y = `Active breweries`)) +
geom_line() +
theme_minimal()
Or try expand
for the first part:
df %>%
group_by(brouwerijnaam) %>%
expand(Year = begin_datum_jaar:eind_datum_jaar) %>%
ungroup() %>%
count(Year, name = "Active breweries")
However, note that the rowwise
, do
or expand
parts are resource intensive and may take long time. If that happens, I'd rather use data.table
for expanding the data frame, and then continue, like below:
library(data.table)
library(tidyverse)
df <- setDT(df)[, .(Year = seq(begin_datum_jaar, eind_datum_jaar, by = 1)), by = brouwerijnaam]
df %>%
count(Year, name = "Active breweries") %>%
ggplot(aes(x = Year, y = `Active breweries`)) +
geom_line() +
theme_minimal()
The above gives you the plot directly. If you'd like to save it to a data frame first (and then do the ggplot2
thing), this is the main part (I use the data.table
for expanding as it's much faster in my experience):
library(data.table)
library(tidyverse)
df <- setDT(df)[
, .(Year = seq(begin_datum_jaar, eind_datum_jaar, by = 1)),
by = brouwerijnaam] %>%
count(Year, name = "Active breweries")
Output:
# A tibble: 680 x 2
Year `Active breweries`
<dbl> <int>
1 1340 1
2 1341 1
3 1342 1
4 1343 1
5 1344 1
6 1345 1
7 1346 1
8 1347 1
9 1348 1
10 1349 1
# ... with 670 more rows
Upvotes: 3
Reputation: 887901
We can use map2
to get the sequence from start to end date for each corresponding element, unnest
the list
column to expand and use count
to get the frequency of the 'year'
library(tidyverse)
df %>%
transmute(year = map2(begin_datum_jaar, eind_datum_jaar, `:`)) %>%
unnest %>%
count(year)
# A tibble: 680 x 2
# year n
# <int> <int>
# 1 1340 1
# 2 1341 1
# 3 1342 1
# 4 1343 1
# 5 1344 1
# 6 1345 1
# 7 1346 1
# 8 1347 1
# 9 1348 1
#10 1349 1
# … with 670 more rows
Or using Map
from base R
table(unlist(do.call(Map, c(f = `:`, df[-1]))))
Upvotes: 2
Reputation: 2017
df1 <- data.frame(year=1000:2020) # Enter range for years of choice
df1 %>%
rowwise()%>%
mutate(cnt=nrow(df %>%
filter(begin_datum_jaar<year & eind_datum_jaar>year)
)
)
Upvotes: 0