Count occurrences across multiple columns and group by year

Question

I have a movies dataset which has a column for year and three for genres.

Here's an example:

genre_structure<-structure(
  list(
    year = c(
      "2008",
      "2003",
      "2010",
      "2001",
      "2002",
      "1999",
      "1980",
      "2020",
      "1977",
      "1991",
      "1954",
      "2022",
      "1962",
      "2000",
      "1994",
      "2019",
      "2019",
      "1981",
      "2012",
      "2003"
    ),
    genre1 = c(
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action"
    ),
    genre2 = c(
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Adventure",
      "SciFi",
      "Adventure",
      "Drama",
      "Adventure",
      "SciFi",
      "Drama",
      "Drama",
      "Drama",
      "Adventure",
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Drama",
      "Drama"
    ),
    genre3 = c(
      "Drama",
      "Drama",
      "SciFi",
      "Drama",
      "Drama",
      "",
      "Fantasy",
      "",
      "Fantasy",
      "",
      "",
      "Mystery",
      "Mystery",
      "Drama",
      "Drama",
      "Crime",
      "Drama",
      "",
      "",
      "Mystery"
    )
  ),
  row.names = c(NA,-20L),
  class =  "data.frame"
  )

I am trying to count all 3 genres for each year. Expected result is (example):

genre | year| count
Action |2008| 1
Comedy | 2008 | 3
Drama | 2008 | 4
...

I tried:

genre_years_test<-genre_structure %>% 
  group_by(genre1, genre2, genre3, year) %>% 
  summarise(total=n(), .groups = "drop")

But it is repeating the years each time a new genre is released in that year.

akrun · Accepted Answer

We may reshape to 'long' and get the count

library(dplyr)
library(tidyr)
genre_structure %>% 
  pivot_longer(cols = -year, values_to = 'genre') %>%
  count(year, genre, name = 'count')

Count occurrences across multiple columns and group by year

Answers (2)

Related Questions