Reputation: 131
I am relatively new to Pandas so my sincere apologies if my question was not framed properly, I have a data set here,
t HVAC1_S1 HVAC2_S1 ... HVAC4_S1 HVAC5_S1 HVAC6_S1
0 2009-08-18 18:12 711.0 0.0 ... 0.0 0.0 0.0
1 2009-08-18 18:14 705.0 734.0 ... 0.0 725.0 711.0
2 2009-08-18 18:15 0.0 0.0 ... 730.0 0.0 0.0
3 2009-08-18 18:29 0.0 721.0 ... 716.0 718.0 712.0
4 2009-08-18 18:30 705.0 0.0 ... 0.0 0.0 0.0
... ... ... ... ... ... ...
156033 2012-11-09 21:59 714.0 720.0 ... 739.0 0.0 727.0
156034 2012-11-09 22:00 0.0 0.0 ... 0.0 743.0 0.0
156035 2012-11-09 22:14 723.0 729.0 ... 734.0 743.0 732.0
156036 2012-11-09 22:29 718.0 732.0 ... 0.0 739.0 725.0
156037 2012-11-09 22:30 0.0 0.0 ... 739.0 0.0 0.0
So if you notice the timestamp, for example, the index '1' i.e. at 2009-08-18 18:14 there were 5 readings taken (one is not visible because of ...). Now actually there are 6 readings in total but last reading was taken at the next minute i.e. at 2009-08-18 18:15. Well, it is a bit confusing because there are no seconds here. Moreover, all the 6 readings are taken at a time and then after 15 minutes they are measured again. Now while measuring one value often gets measured at the next minute because of which it is in a new row.
I want to set a tolerance of 1 minute and merge the value of such rows (like at index 2,4, 156034, 156037) to its previous rows and delete that row from data frame
so for e.g. I would like to obtain my dataframe as,
t HVAC1_S1 HVAC2_S1 ... HVAC4_S1 HVAC5_S1 HVAC6_S1
0 2009-08-18 18:12 711.0 0.0 ... 0.0 0.0 0.0
1 2009-08-18 18:14 705.0 734.0 ... 730.0 725.0 711.0
2 2009-08-18 18:29 705.0 721.0 ... 716.0 718.0 712.0
... ... ... ... ... ... ...
156033 2012-11-09 21:59 714.0 720.0 ... 739.0 743.0 727.0
156035 2012-11-09 22:14 723.0 729.0 ... 734.0 743.0 732.0
156036 2012-11-09 22:29 718.0 732.0 ... 739.0 739.0 725.0
Note: index 0 can be neglected because it is out of the 1 min tolerance.
please provide your valuable suggestion. Thank you!
Upvotes: 1
Views: 570
Reputation: 59549
You can use an asof
merge to merge the DataFrame with itself. Pick the forward direction, and do not allow exact matches (else the same row would always merge with itself). We can specify a tolerance of 1 minute. We need to bring the index along in the merge so we can remove rows that merged with others after the merge.
After the merge we'll have _x
and _y
columns we can remove those and group along the columns axis to combine them.
t HVAC1_S1 HVAC2_S1 HVAC4_S1 HVAC5_S1 HVAC6_S1
0 2009-08-18 18:12:00 711.0 0.0 0.0 0.0 0.0
1 2009-08-18 18:14:00 705.0 734.0 0.0 725.0 711.0
2 2009-08-18 18:15:00 0.0 0.0 730.0 0.0 0.0
3 2009-08-18 18:29:00 0.0 721.0 716.0 718.0 712.0
4 2009-08-18 18:30:00 705.0 0.0 0.0 0.0 0.0
import pandas as pd
#df['t'] = pd.to_datetime(df['t']) #If not datetime
res = pd.merge_asof(df.reset_index(), df.reset_index(),
on='t',
direction='forward',
allow_exact_matches=False,
tolerance=pd.Timedelta('1min'))
res = res.set_index('t')
res = res[~res.index_x.isin(res.index_y)] #Remove rows that merged with others
res = res.drop(columns=['index_x', 'index_y'])
res = res.groupby(res.columns.str.rsplit('_', n=1).str[0], axis=1).sum()
HVAC1_S1 HVAC2_S1 HVAC4_S1 HVAC5_S1 HVAC6_S1
t
2009-08-18 18:12:00 711.0 0.0 0.0 0.0 0.0
2009-08-18 18:14:00 705.0 734.0 730.0 725.0 711.0
2009-08-18 18:29:00 705.0 721.0 716.0 718.0 712.0
Upvotes: 1