noxrain
noxrain

Reputation: 11

Finding the top n represented entries in a grouped dataframe in R

I am a beginner in R and would be very thankful for a response as I am stuck on this code (this is my attempt at solving the problem but it does not work):

personal_spotify_df <- fromJSON("data/StreamingHistory0.json")
personal_spotify_df = personal_spotify_df %>%
  mutate(minutesPlayed = msPlayed/1000/60)
  
personal_spotify_df_ranked <- personal_spotify_df %>%
  group_by(artistName) %>%
  filter(top_n(15, max(nrows())))

I have a dataframe (see below for a screenshot on how its structured) which is my spotity listening history. I want to group this dataframe by artists and afterwards arrange the new dataframe to show the top 15 artists with the most songs listened to. I am stuck on how to get from grouping by artistName to actually filtering out the top 15 represented artists from the dataframe.

The dataframe

Upvotes: 0

Views: 87

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388807

In base R, you can make use of table, sort and head to get top 15 artists with their count

table(personal_spotify_df$artistName) |>
  sort(decreasing = TRUE) |>
  head(15) |>
  stack()

The pipe operator (|>) requires R 4.1 if you have a lower version use -

stack(head(sort(table(personal_spotify_df$artistName), decreasing = TRUE), 15))

Upvotes: 0

akrun
akrun

Reputation: 886938

We may use slice_max, with n specified as 15 and the order column created with add_count

library(dplyr)
personal_spotify_df %>%
  add_count(artistName, name = "Count") %>%  
  slice_max(n = 15, order_by = "Count") %>%
  select(-Count)

If we want to get only the top 15 distinct 'artistName',

personal_spotify_df %>%
    count(artistName, name = "Count") %>%
    slice_max(n = 15, order_by = "Count")

Or an option with filter after arrangeing the rows based on the count

personal_spotify_df %>%
   add_count(artistName) %>%
   arrange(desc(n)) %>%
   filter(artistName %in% head(unique(artistName), 15))

Upvotes: 2

Related Questions