Finding the top n represented entries in a grouped dataframe in R

Question

I am a beginner in R and would be very thankful for a response as I am stuck on this code (this is my attempt at solving the problem but it does not work):

personal_spotify_df <- fromJSON("data/StreamingHistory0.json")
personal_spotify_df = personal_spotify_df %>%
  mutate(minutesPlayed = msPlayed/1000/60)
  
personal_spotify_df_ranked <- personal_spotify_df %>%
  group_by(artistName) %>%
  filter(top_n(15, max(nrows())))

I have a dataframe (see below for a screenshot on how its structured) which is my spotity listening history. I want to group this dataframe by artists and afterwards arrange the new dataframe to show the top 15 artists with the most songs listened to. I am stuck on how to get from grouping by artistName to actually filtering out the top 15 represented artists from the dataframe.

The dataframe

akrun · Accepted Answer

We may use slice_max, with n specified as 15 and the order column created with add_count

library(dplyr)
personal_spotify_df %>%
  add_count(artistName, name = "Count") %>%  
  slice_max(n = 15, order_by = "Count") %>%
  select(-Count)

If we want to get only the top 15 distinct 'artistName',

personal_spotify_df %>%
    count(artistName, name = "Count") %>%
    slice_max(n = 15, order_by = "Count")

Or an option with filter after arrangeing the rows based on the count

personal_spotify_df %>%
   add_count(artistName) %>%
   arrange(desc(n)) %>%
   filter(artistName %in% head(unique(artistName), 15))

Finding the top n represented entries in a grouped dataframe in R

Answers (2)

Related Questions