Reputation: 29
I have the following pandas timeseries dataframe:
Index Time Centre position X Centre position Y Datafile Group Zone Timeframe dV
8789 1257.318 180.0 201.0 CHR1 CHR Zone A Before stimulation 15.625000
8790 1257.462 181.0 195.0 CHR1 CHR Zone A Before stimulation 42.241406
8791 1257.590 184.0 188.0 CHR1 CHR Zone A Before stimulation 59.498227
8792 1257.718 187.0 184.0 CHR1 CHR Zone B Before stimulation 39.062500
8793 1257.862 190.0 176.0 CHR1 CHR Zone B Before stimulation 59.333359
8794 1257.927 190.0 173.0 CHR1 CHR Zone A Before stimulation 46.153846
8795 1258.054 192.0 171.0 CHR1 CHR Zone A Before stimulation 22.271080
8796 1258.198 192.0 172.0 CHR1 CHR Zone C After stimulation 6.944444
8797 1258.326 192.0 171.0 CHR1 CHR Zone C After stimulation 7.812500
8798 1258.454 191.0 169.0 CHR1 CHR Zone A After stimulation 17.469281
8799 1258.598 191.0 168.0 CHR1 CHR Zone A After stimulation 6.944444
8800 1258.726 192.0 165.0 CHR1 CHR Zone A After stimulation 24.705294
I would like to extract the average speed (dV) grouped by Timeframe
and Zone
, however, since the data is sequential, I would like to get multiple averages for a given Timeframe and Zone. I couldn't figure out an elegant way to achieve this since groupby averages all the values and outputs a single value.
Thank you very much in advance!
Upvotes: 1
Views: 42
Reputation: 106
The first thing you need do is to create a reference column. A very naive way is like
df.loc[:,'Zone_shift']=df.loc[:,'Zone'].shift(1)
df.loc[:,'Timeframe_shift']=df.loc[:,'Timeframe'].shift(1)
df.loc[:,'Groupby'] = df.apply(lambda x: 0 if x['Zone']==x['Zone_shift'] and x['Timeframe']==x['Timeframe_shift'] else 1, axis=1)
df.loc[:,'Groupby'] = df.loc[:,'Groupby'].cumsum()
After adding the reference data, the dataframe is look like
Zone Timeframe dV Zone_shift Timeframe_shift Groupby
0 ZoneA Beforestimulation 15.625 nan nan 1
1 ZoneA Beforestimulation 42.241 ZoneA Beforestimulation 1
2 ZoneA Beforestimulation 59.498 ZoneA Beforestimulation 1
3 ZoneB Beforestimulation 39.062 ZoneA Beforestimulation 2
4 ZoneB Beforestimulation 59.333 ZoneB Beforestimulation 2
5 ZoneA Beforestimulation 46.153 ZoneB Beforestimulation 3
6 ZoneA Beforestimulation 22.271 ZoneA Beforestimulation 3
7 ZoneC Afterstimulation 6.9444 ZoneA Beforestimulation 4
8 ZoneC Afterstimulation 7.8125 ZoneC Afterstimulation 4
9 ZoneA Afterstimulation 17.469 ZoneC Afterstimulation 5
10 ZoneA Afterstimulation 6.9444 ZoneA Afterstimulation 5
11 ZoneA Afterstimulation 24.705 ZoneA Afterstimulation 5
Then you only need to groupby by
df.groupby(['Groupby','Zone','Timeframe']).mean()
And the final output would be like
Groupby Zone Timeframe dV
1 ZoneA Beforestimulation 39.12154433333333
2 ZoneB Beforestimulation 49.1979295
3 ZoneA Beforestimulation 34.212463
4 ZoneC Afterstimulation 7.378472
5 ZoneA Afterstimulation 16.373006333333333
Upvotes: 1