hema
hema

Reputation: 21

Calculate mean of certain rows for each group after group by Pandas

I have a dataframe which records 2 parameters and time for many IDs. There are malfunction situations that all parameters are 0.0, but date exists, like row 5.

 0   id  Param1  Param2   date
 1   1   1.45    6.47     2014-09-01 
 2   1   2.84    66.7     2014-09-03 
 3   2   -0.21   30       2014-11-11
 4   2   9970    3.9      2014-12-09
 5   2   0.0     0.0      2014-12-31
 6   2   8.26    1.25     2015-05-19
 7   3   90.1    45.9     2014-09-01
 8   3   9.01    4.0      2014-10-05

Such malfunction situation occurs once and only once for every ID. I would like to calculate the mean of parameters that are 10 rows above and below the malfunction row of this ID, separately.

How should I do this? I attempted to use groupby to isolate each ID, but failed to locate the malfunction records of each ID group and do calculations separately for rows above and below.

The dataframe contains ~3000 IDs and 900,000,000 rows of records. Any help would be much appreciated. Thank you.

Upvotes: 2

Views: 281

Answers (1)

Tihom_Rahtus
Tihom_Rahtus

Reputation: 126

Check below code, if it provides desired output. Note that I have used rolling window of 3 row considering smaller set of data.

Data frame :

import pandas as pd
import numpy as np
Idata= {'id':[1,1,2,2,2,2,3,3,3],'param1':[1.45,2.84,-0.21,9970,0,8.26,0,90.1,9.01],'param2':[6.47,66.7,30,3.9,0,1.25,0,45.9,4],'date':['01-09-2014','03-09-2014','11-11-2014','09-12-2014','31-12-2014','19-05-2015','01-09-2014','05-10-2014','05-11-2014'],}
df=pd.DataFrame(Idata)

Code :

dflist=df.loc[df['param1']==0].index.values.tolist()
dflist2=np.array([list(range(x-1,x+2)) for x in dflist]).ravel().tolist()
df3=df.loc[dflist2,:][['param1','param2']].rolling(3,center=True, axis=0).mean().drop_duplicates()
df_final=df3.loc[df['param1']==0]

Output:

index   param1  param2
4   3326.086667 1.716667
6   32.786667   15.716667

Upvotes: 1

Related Questions