JabberJabber
JabberJabber

Reputation: 383

Groupy Pandas DataFrame with Multiple Conditions

I need to groupby on a single field, then get the nlargest(14) records on a date field, then get the mean of another field, and I am getting stuck on the logic.

data = [['NRB000043', nan, None, Timestamp('2020-01-27 00:00:00')],
 ['NRB000042', nan, None, Timestamp('2020-01-27 00:00:00')],
 ['483951076', nan, None, Timestamp('2020-01-27 00:00:00')],
 ['080699991', nan, None, Timestamp('2020-01-27 00:00:00')],
 ['NRB000045', nan, None, Timestamp('2020-01-27 00:00:00')],
 ['530639995', 23.0, None, Timestamp('2020-01-27 00:00:00')],
 ['530639997', 24.0, None, Timestamp('2020-01-27 00:00:00')]]

df = pd.DataFrame(data, columns=['sid', 'measure', 'co_unit', 'timedate'])

series = df.groupby('sid')['timedate'].nlargest(14)


So this is what I have, but I am stuck trying to get the mean of the measure field. Can someone please assist me with the proper logic?

Thanks!

Upvotes: 0

Views: 55

Answers (1)

YOLO
YOLO

Reputation: 21709

You can use .agg method:

df.groupby('sid').agg({'timedate': lambda x: x.nlargest(14),
                       'measure': 'mean'})

print(df)

            timedate  measure
sid                          
080699991 2020-01-27      NaN
483951076 2020-01-27      NaN
530639995 2020-01-27     23.0
530639997 2020-01-27     24.0
NRB000042 2020-01-27      NaN
NRB000043 2020-01-27      NaN
NRB000045 2020-01-27      NaN

Upvotes: 1

Related Questions