Shashivydyula
Shashivydyula

Reputation: 85

Count occurrences across multiple columns and group by year

I have a movies dataset which has a column for year and three for genres.

Here's an example:

genre_structure<-structure(
  list(
    year = c(
      "2008",
      "2003",
      "2010",
      "2001",
      "2002",
      "1999",
      "1980",
      "2020",
      "1977",
      "1991",
      "1954",
      "2022",
      "1962",
      "2000",
      "1994",
      "2019",
      "2019",
      "1981",
      "2012",
      "2003"
    ),
    genre1 = c(
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action"
    ),
    genre2 = c(
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Adventure",
      "SciFi",
      "Adventure",
      "Drama",
      "Adventure",
      "SciFi",
      "Drama",
      "Drama",
      "Drama",
      "Adventure",
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Drama",
      "Drama"
    ),
    genre3 = c(
      "Drama",
      "Drama",
      "SciFi",
      "Drama",
      "Drama",
      "",
      "Fantasy",
      "",
      "Fantasy",
      "",
      "",
      "Mystery",
      "Mystery",
      "Drama",
      "Drama",
      "Crime",
      "Drama",
      "",
      "",
      "Mystery"
    )
  ),
  row.names = c(NA,-20L),
  class =  "data.frame"
  )

I am trying to count all 3 genres for each year. Expected result is (example):

genre | year| count
Action |2008| 1
Comedy | 2008 | 3
Drama | 2008 | 4
...

I tried:

genre_years_test<-genre_structure %>% 
  group_by(genre1, genre2, genre3, year) %>% 
  summarise(total=n(), .groups = "drop")

But it is repeating the years each time a new genre is released in that year.

Upvotes: 2

Views: 182

Answers (2)

M--
M--

Reputation: 28826

Just for laffs, here's a solution in base:

subset(as.data.frame(
        table(cbind(genre_structure[1], stack(genre_structure[-1]))[-3])
                    ), Freq != 0)

And in data.table:

library(data.table)

melt(setDT(genre_structure), id.vars = c("year"),
                             variable.name = "genre")[, list(Freq =.N), 
                                                       .(year, value)]

Upvotes: 3

akrun
akrun

Reputation: 886938

We may reshape to 'long' and get the count

library(dplyr)
library(tidyr)
genre_structure %>% 
  pivot_longer(cols = -year, values_to = 'genre') %>%
  count(year, genre, name = 'count')

Upvotes: 3

Related Questions