stickfrosch95
stickfrosch95

Reputation: 15

Python: Counting number of titles of each genres and average rating

I'm very new to python and don't know how to proceed. I have a csv file with over 100k rows in following structure:

title,genres,rating

Lord of the Rings,Adventure|Animation|Children|Comedy|Fantasy,4.0

Lord of the Rings,Adventure|Animation|Children|Comedy|Fantasy,4.1

Star Wars,Adventure|Animation|Children|Comedy|Fantasy,4.5

Toy Story,Adventure|Animation|Children|Comedy|Fantasy,2.5

.
.
.

I need to analyze the number of titles for each genres and the average rating of each genres.

I have the csv.reader already but I don´t know how to count the titles per each genres and their average rating.

Thanks for every help!

Upvotes: 0

Views: 467

Answers (2)

rodvictor
rodvictor

Reputation: 349

supposing the file has only these five lines, the code would be the following. Nonetheless, it will scale up for more lines as long as you implement the logic for reading your file (instead of creating a list at the start as I did):

file = ["title,genres,rating",
        "Lord of the Rings,Adventure|Animation|Children|Comedy|Fantasy,4.0",
        "Lord of the Rings,Adventure|Animation|Children|Comedy|Fantasy,4.1",
        "Star Wars,Adventure|Animation|Children|Comedy|Fantasy,4.5",
        "Toy Story,Adventure|Animation|Children|Comedy|Fantasy,2.5"]

header = True #First line is a header
dictionaryOfGenders = {} #Stores the list of ratings for each gender. The average will be computed later.

for line in file:
    
    if header:
        header=False
        continue
        
    else:
        line = line.strip("\n").split(",") #line[0] is the title of the movie, line[1] is a list of genders with a separator "|", line[2] is the rating of the movie
        genders = line[1].split("|")
        rating = float(line[2])

        for gender in genders: #Several genders are assigned to the same movie, each of them will append this rating to their list of ratings
            if gender in dictionaryOfGenders:
                dictionaryOfGenders[gender].append(rating)
            else:
                dictionaryOfGenders[gender] = [rating]
            
averageRating = {} #Storages the final average for each gender

for gender in dictionaryOfGenders.keys():
    averageRating[gender] = sum(dictionaryOfGenders[gender])/len(dictionaryOfGenders[gender])
    
print (averageRating)

I hope you find it helpful :)

OUTPUT OF THIS SCRIPT:

{'Adventure': 3.775, 'Animation': 3.775, 'Children': 3.775, 'Comedy': 3.775, 'Fantasy': 3.775}

Upvotes: 0

ThePyGuy
ThePyGuy

Reputation: 18466

Split genres on |, explode it, groupby genres, and use agg as size for title and mean for rating.

df['genres']=df['genres'].str.split('|')
df = (df.explode('genres')
        .groupby('genres')[['title', 'rating']]
        .agg({'title':'size', 'rating':'mean'})
      )

OUTPUT:

           title  rating
genres                  
Adventure      4   3.775
Animation      4   3.775
Children       4   3.775
Comedy         4   3.775
Fantasy        4   3.775

Upvotes: 1

Related Questions