Reputation: 1599

Count per year with only start and end year data

I want to create a line chart in ggplot2 with 350 beer breweries. I want to count per year how many active breweries there are. I only have the start and end date of brewery activity. tidyverse answers prefered.

begin_datum_jaar is year the brewery started. eind_datum_jaar is in which year the brewery has ended.

example data frame:

library(tidyverse)

# A tibble: 4 x 3
  brouwerijnaam begin_datum_jaar eind_datum_jaar
  <chr>                    <int>           <int>
1 Brand                     1340            2019
2 Heineken                  1592            2019
3 Grolsche                  1615            2019
4 Bavaria                   1719            2010

dput:

df <- structure(list(brouwerijnaam = c("Brand", "Heineken", "Grolsche", 
"Bavaria"), begin_datum_jaar = c(1340L, 1592L, 1615L, 1719L), 
    eind_datum_jaar = c(2019L, 2019L, 2019L, 2010L)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))

Desired output where etc. is a placeholder.

# A tibble: 13 x 2
   year      n
   <chr> <dbl>
 1 1340      1
 2 1341      1
 3 1342      1
 4 1343      1
 5 etc.      1
 6 1592      2
 7 1593      2
 8 etc.      2
 9 1625      3
10 1626      3
11 1627      3
12 1628      3
13 etc.      3

Upvotes: 1

Answers (3)

arg0naut91

Reputation: 14774

Could try:

library(tidyverse)

df %>%
  rowwise %>%
  do(data.frame(brouwerij = .$brouwerijnaam,
                Year = seq(.$begin_datum_jaar, .$eind_datum_jaar, by = 1))) %>%
  count(Year, name = "Active breweries") %>%
  ggplot(aes(x = Year, y = `Active breweries`)) + 
  geom_line() +
  theme_minimal()

Or try expand for the first part:

df %>%
  group_by(brouwerijnaam) %>%
  expand(Year = begin_datum_jaar:eind_datum_jaar) %>%
  ungroup() %>%
  count(Year, name = "Active breweries")

However, note that the rowwise, do or expand parts are resource intensive and may take long time. If that happens, I'd rather use data.table for expanding the data frame, and then continue, like below:

library(data.table)
library(tidyverse)

df <- setDT(df)[, .(Year = seq(begin_datum_jaar, eind_datum_jaar, by = 1)), by = brouwerijnaam]

df %>%
  count(Year, name = "Active breweries") %>%
  ggplot(aes(x = Year, y = `Active breweries`)) + 
  geom_line() +
  theme_minimal()

The above gives you the plot directly. If you'd like to save it to a data frame first (and then do the ggplot2 thing), this is the main part (I use the data.table for expanding as it's much faster in my experience):

library(data.table)
library(tidyverse)

df <- setDT(df)[
  , .(Year = seq(begin_datum_jaar, eind_datum_jaar, by = 1)), 
  by = brouwerijnaam] %>%
  count(Year, name = "Active breweries")

Output:

# A tibble: 680 x 2
    Year `Active breweries`
   <dbl>              <int>
 1  1340                  1
 2  1341                  1
 3  1342                  1
 4  1343                  1
 5  1344                  1
 6  1345                  1
 7  1346                  1
 8  1347                  1
 9  1348                  1
10  1349                  1
# ... with 670 more rows

Upvotes: 3

akrun

Reputation: 887901

We can use map2 to get the sequence from start to end date for each corresponding element, unnest the list column to expand and use count to get the frequency of the 'year'

library(tidyverse)
df %>% 
   transmute(year = map2(begin_datum_jaar, eind_datum_jaar, `:`)) %>%
   unnest %>%
   count(year)
# A tibble: 680 x 2
#    year     n
#   <int> <int>
# 1  1340     1
# 2  1341     1
# 3  1342     1
# 4  1343     1
# 5  1344     1
# 6  1345     1
# 7  1346     1
# 8  1347     1
# 9  1348     1
#10  1349     1
# … with 670 more rows

Or using Map from base R

table(unlist(do.call(Map, c(f = `:`, df[-1]))))

Upvotes: 2

Rohit

Reputation: 2017

  df1 <- data.frame(year=1000:2020) # Enter range for years of choice

  df1 %>% 
  rowwise()%>% 
  mutate(cnt=nrow(df %>% 
                    filter(begin_datum_jaar<year & eind_datum_jaar>year) 
                  )
         )

Upvotes: 0

Count per year with only start and end year data

Answers (3)

Related Questions