Reputation: 383
I need to groupby on a single field, then get the nlargest(14) records on a date field, then get the mean of another field, and I am getting stuck on the logic.
data = [['NRB000043', nan, None, Timestamp('2020-01-27 00:00:00')],
['NRB000042', nan, None, Timestamp('2020-01-27 00:00:00')],
['483951076', nan, None, Timestamp('2020-01-27 00:00:00')],
['080699991', nan, None, Timestamp('2020-01-27 00:00:00')],
['NRB000045', nan, None, Timestamp('2020-01-27 00:00:00')],
['530639995', 23.0, None, Timestamp('2020-01-27 00:00:00')],
['530639997', 24.0, None, Timestamp('2020-01-27 00:00:00')]]
df = pd.DataFrame(data, columns=['sid', 'measure', 'co_unit', 'timedate'])
series = df.groupby('sid')['timedate'].nlargest(14)
So this is what I have, but I am stuck trying to get the mean of the measure
field. Can someone please assist me with the proper logic?
Thanks!
Upvotes: 0
Views: 55
Reputation: 21709
You can use .agg
method:
df.groupby('sid').agg({'timedate': lambda x: x.nlargest(14),
'measure': 'mean'})
print(df)
timedate measure
sid
080699991 2020-01-27 NaN
483951076 2020-01-27 NaN
530639995 2020-01-27 23.0
530639997 2020-01-27 24.0
NRB000042 2020-01-27 NaN
NRB000043 2020-01-27 NaN
NRB000045 2020-01-27 NaN
Upvotes: 1