Reputation: 67
I have a data-frame df
and want to do is the following:
What I tried is:
median_old = df.sort_values('user_id').groupby('user_id')['total_play_seconds'].sum().median()
Although I believe my output is correct, the online course won't let me proceed, stating that the median value is incorrect.
Where did I go wrong? As this is a task of an online course, I don't have a reproducible example, but I hope the matter is clear.
Upvotes: 0
Views: 1358
Reputation: 8318
I'll base my answer on the example taken from:
https://www.tutorialspoint.com/python_pandas/python_pandas_groupby.htm
import pandas as pd
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
print(df)
median = df.sort_values('Team').groupby("Team")["Points"].sum().to_frame()["Points"].median()
print(median)
As you can see, after the groupby
and sum
you get a pandas Series object and not a data-frame again. So you can't apply the median
on the desired group. So I believe all you need to do is add to_frame
and then calculate the median
with the same logic you calculated the sum
.
So in your case it should be:
median_old = df.sort_values('user_id').groupby('user_id')['total_play_seconds'].sum().to_frame()["total_play_seconds"].median()
Upvotes: 1