Reputation: 21
I have a dataframe which records 2 parameters and time for many IDs. There are malfunction situations that all parameters are 0.0, but date exists, like row 5.
0 id Param1 Param2 date
1 1 1.45 6.47 2014-09-01
2 1 2.84 66.7 2014-09-03
3 2 -0.21 30 2014-11-11
4 2 9970 3.9 2014-12-09
5 2 0.0 0.0 2014-12-31
6 2 8.26 1.25 2015-05-19
7 3 90.1 45.9 2014-09-01
8 3 9.01 4.0 2014-10-05
Such malfunction situation occurs once and only once for every ID. I would like to calculate the mean of parameters that are 10 rows above and below the malfunction row of this ID, separately.
How should I do this? I attempted to use groupby to isolate each ID, but failed to locate the malfunction records of each ID group and do calculations separately for rows above and below.
The dataframe contains ~3000 IDs and 900,000,000 rows of records. Any help would be much appreciated. Thank you.
Upvotes: 2
Views: 281
Reputation: 126
Check below code, if it provides desired output. Note that I have used rolling window of 3 row considering smaller set of data.
Data frame :
import pandas as pd
import numpy as np
Idata= {'id':[1,1,2,2,2,2,3,3,3],'param1':[1.45,2.84,-0.21,9970,0,8.26,0,90.1,9.01],'param2':[6.47,66.7,30,3.9,0,1.25,0,45.9,4],'date':['01-09-2014','03-09-2014','11-11-2014','09-12-2014','31-12-2014','19-05-2015','01-09-2014','05-10-2014','05-11-2014'],}
df=pd.DataFrame(Idata)
Code :
dflist=df.loc[df['param1']==0].index.values.tolist()
dflist2=np.array([list(range(x-1,x+2)) for x in dflist]).ravel().tolist()
df3=df.loc[dflist2,:][['param1','param2']].rolling(3,center=True, axis=0).mean().drop_duplicates()
df_final=df3.loc[df['param1']==0]
Output:
index param1 param2
4 3326.086667 1.716667
6 32.786667 15.716667
Upvotes: 1