Pandas: merge dataframe rows and take an average of the second column values

Question

I have a two-column data frame with the first column containing a date (yyyy-mm-dd) and the second a rating out of five, i.e. '1' or '2' etc. The df is in order by date, with dates descending from the first row.

I am looking for a way to merge the rows containing identical date values - i.e. all of the 2021-05-05 and then take the average of all of the rating values for that given date to provide the corresponding rating average for that date.

For instance, if my df looks like this:

    Date        Rating
0  2021-05-05   1
1  2021-05-05   3
2  2021-05-05   2
3  2021-05-04   4
4  2021-05-04   6
5  2021-05-04   5

I want to merge it so it becomes like so:

    Date        Rating
0  2021-05-05   2
1  2021-05-04   5

SeaBean · Accepted Answer

You can use df.groupby() with sort=False (to maintain original sorting sequence) and as_index=False (to keep Date column as data column instead of row index). Use built-in aggregate function .mean() to get the average.

df.groupby('Date', as_index=False, sort=False)['Rating'].mean()

Output:

         Date  Rating
0  2021-05-05       2
1  2021-05-04       5

Pandas: merge dataframe rows and take an average of the second column values

Answers (2)

Related Questions