Reputation: 79
Hi so I am currently working on IMDB movie metadata dataframe that looks like this.
I need help finding average imdb rating per year. imdb_score
per title_year
.
Now I have counted the number of movies per year and dropped years with less than 10 movies for it to be relevant.
I did as follows
years = df['title_year'].value_counts()
years
and then
years2 = years[years >= 10]
years2
which resulted in
2009.0 260
2014.0 252
2006.0 239
2013.0 237
2010.0 230
2015.0 226
2011.0 225
2008.0 225
2012.0 221
2005.0 221
2004.0 214
2002.0 209
2007.0 204
2001.0 188
2000.0 171
2003.0 169
1999.0 168
1998.0 134
1997.0 118
2016.0 106
1996.0 99
1995.0 70
1994.0 54
1993.0 48
1992.0 34
1981.0 33
1989.0 33
1987.0 32
1991.0 31
1988.0 31
1984.0 31
1982.0 30
1990.0 30
1985.0 29
1986.0 26
1980.0 24
1983.0 22
1978.0 16
1977.0 16
1979.0 16
1970.0 12
1971.0 11
1968.0 11
1969.0 10
1964.0 10
1976.0 10
Name: title_year, dtype: int64
Now I am confused how do you find the average imdb rating per year because I would like to plot a graph afterwards. Can anybody help me?
Upvotes: 0
Views: 1210
Reputation: 6495
You can use pandas.DataFrame.groupby:
year_avg_score = df.loc[df['title_year'].isin(year2.index)].groupby('title_year')['imdb_score'].mean()
Step by step:
df.loc[df['title_year'].isin(year2.index)]
filters only the years with more than 10 movies, which you already computed.title_year
and select the imdb_score
column.imdb_score
average.The resulting dataframe year_avg_score
will have the year as index and the average score as column.
Upvotes: 1