Python: Counting number of titles of each genres and average rating

Question

I'm very new to python and don't know how to proceed. I have a csv file with over 100k rows in following structure:

title,genres,rating

Lord of the Rings,Adventure|Animation|Children|Comedy|Fantasy,4.0

Lord of the Rings,Adventure|Animation|Children|Comedy|Fantasy,4.1

Star Wars,Adventure|Animation|Children|Comedy|Fantasy,4.5

Toy Story,Adventure|Animation|Children|Comedy|Fantasy,2.5

.
.
.

I need to analyze the number of titles for each genres and the average rating of each genres.

I have the csv.reader already but I don´t know how to count the titles per each genres and their average rating.

Thanks for every help!

ThePyGuy · Accepted Answer

Split genres on |, explode it, groupby genres, and use agg as size for title and mean for rating.

df['genres']=df['genres'].str.split('|')
df = (df.explode('genres')
        .groupby('genres')[['title', 'rating']]
        .agg({'title':'size', 'rating':'mean'})
      )

OUTPUT:

           title  rating
genres                  
Adventure      4   3.775
Animation      4   3.775
Children       4   3.775
Comedy         4   3.775
Fantasy        4   3.775

Python: Counting number of titles of each genres and average rating

Answers (2)

Related Questions