Onur Serçe
Onur Serçe

Reputation: 29

Is there a way to get multiple averages after groupby function in pandas?

I have the following pandas timeseries dataframe:

Index   Time    Centre position X   Centre position Y   Datafile    Group   Zone    Timeframe   dV
8789    1257.318    180.0   201.0   CHR1    CHR Zone A  Before stimulation  15.625000
8790    1257.462    181.0   195.0   CHR1    CHR Zone A  Before stimulation  42.241406
8791    1257.590    184.0   188.0   CHR1    CHR Zone A  Before stimulation  59.498227
8792    1257.718    187.0   184.0   CHR1    CHR Zone B  Before stimulation  39.062500
8793    1257.862    190.0   176.0   CHR1    CHR Zone B  Before stimulation  59.333359
8794    1257.927    190.0   173.0   CHR1    CHR Zone A  Before stimulation  46.153846
8795    1258.054    192.0   171.0   CHR1    CHR Zone A  Before stimulation  22.271080
8796    1258.198    192.0   172.0   CHR1    CHR Zone C  After stimulation   6.944444
8797    1258.326    192.0   171.0   CHR1    CHR Zone C  After stimulation   7.812500
8798    1258.454    191.0   169.0   CHR1    CHR Zone A  After stimulation   17.469281
8799    1258.598    191.0   168.0   CHR1    CHR Zone A  After stimulation   6.944444
8800    1258.726    192.0   165.0   CHR1    CHR Zone A  After stimulation   24.705294

I would like to extract the average speed (dV) grouped by Timeframe and Zone, however, since the data is sequential, I would like to get multiple averages for a given Timeframe and Zone. I couldn't figure out an elegant way to achieve this since groupby averages all the values and outputs a single value.

Expected output is: Expected output

Thank you very much in advance!

Upvotes: 1

Views: 42

Answers (1)

CameLion
CameLion

Reputation: 106

The first thing you need do is to create a reference column. A very naive way is like

df.loc[:,'Zone_shift']=df.loc[:,'Zone'].shift(1)
df.loc[:,'Timeframe_shift']=df.loc[:,'Timeframe'].shift(1)
df.loc[:,'Groupby'] = df.apply(lambda x: 0 if x['Zone']==x['Zone_shift'] and x['Timeframe']==x['Timeframe_shift'] else 1, axis=1)
df.loc[:,'Groupby'] = df.loc[:,'Groupby'].cumsum()

After adding the reference data, the dataframe is look like

    Zone    Timeframe               dV  Zone_shift  Timeframe_shift Groupby
0   ZoneA   Beforestimulation   15.625    nan        nan            1
1   ZoneA   Beforestimulation   42.241  ZoneA   Beforestimulation   1
2   ZoneA   Beforestimulation   59.498  ZoneA   Beforestimulation   1
3   ZoneB   Beforestimulation   39.062  ZoneA   Beforestimulation   2
4   ZoneB   Beforestimulation   59.333  ZoneB   Beforestimulation   2
5   ZoneA   Beforestimulation   46.153  ZoneB   Beforestimulation   3
6   ZoneA   Beforestimulation   22.271  ZoneA   Beforestimulation   3
7   ZoneC   Afterstimulation    6.9444  ZoneA   Beforestimulation   4
8   ZoneC   Afterstimulation    7.8125  ZoneC   Afterstimulation    4
9   ZoneA   Afterstimulation    17.469  ZoneC   Afterstimulation    5
10  ZoneA   Afterstimulation    6.9444  ZoneA   Afterstimulation    5
11  ZoneA   Afterstimulation    24.705  ZoneA   Afterstimulation    5

Then you only need to groupby by

df.groupby(['Groupby','Zone','Timeframe']).mean()

And the final output would be like

Groupby Zone    Timeframe   dV
1   ZoneA   Beforestimulation   39.12154433333333
2   ZoneB   Beforestimulation   49.1979295
3   ZoneA   Beforestimulation   34.212463
4   ZoneC   Afterstimulation    7.378472
5   ZoneA   Afterstimulation    16.373006333333333

Upvotes: 1

Related Questions