Reputation: 23
I have a two-column data frame with the first column containing a date (yyyy-mm-dd) and the second a rating out of five, i.e. '1' or '2' etc. The df is in order by date, with dates descending from the first row.
I am looking for a way to merge the rows containing identical date values - i.e. all of the 2021-05-05 and then take the average of all of the rating values for that given date to provide the corresponding rating average for that date.
For instance, if my df looks like this:
Date Rating
0 2021-05-05 1
1 2021-05-05 3
2 2021-05-05 2
3 2021-05-04 4
4 2021-05-04 6
5 2021-05-04 5
I want to merge it so it becomes like so:
Date Rating
0 2021-05-05 2
1 2021-05-04 5
Upvotes: 2
Views: 2613
Reputation: 23217
You can use df.groupby()
with sort=False
(to maintain original sorting sequence) and as_index=False
(to keep Date
column as data column instead of row index). Use built-in aggregate function .mean()
to get the average.
df.groupby('Date', as_index=False, sort=False)['Rating'].mean()
Output:
Date Rating
0 2021-05-05 2
1 2021-05-04 5
Upvotes: 2
Reputation: 2223
You can try pandas .groupby()
and .mean()
result = df.groupby('Date', sort=False)['Rating'].mean().reset_index()
Upvotes: 0