Reputation: 592
I have the following sensor data dataframe:
Data_Digital Data_Analog Time
1 10 2015-02-01 00:00:00
1 12 2015-02-01 00:00:05
1 25 2015-02-01 07:45:07
1 25 2015-02-01 07:45:08
1 25 2015-02-01 21:45:10
0 25 2015-03-04 00:00:08
I need to compare the 'Time' of position 0 and the 'Time' of position 1. If the difference between the two teams is greater than six hours they must belong to different classes. However, if the time difference is small (< 6 hours) they must belong to the same class. I need this class to be represented in a new dataframe column.
The desired output is:
Data_Digital Data_Analog Time New_Col_Target
1 10 2015-02-01 00:00:00 1 # init with 1
1 12 2015-02-01 00:00:05 1
1 25 2015-02-01 07:45:07 2 # far from the previous
1 25 2015-02-01 07:45:08 2
1 25 2015-02-01 21:45:10 3 # far from the previous
0 25 2015-03-04 00:00:08 4 # far from the previous
Just below is the original dataframe:
import pandas as pd
df = pd.DataFrame({'Data_Digital': [1, 1, 1, 1, 1, 0],
'Data_Analog': [10, 12, 25, 25, 25, 25],
'Time': ['2015-02-01 00:00:00', '2015-02-01 00:00:05','2015-02-01 07:45:07',
'2015-02-01 07:45:08', '2015-02-01 21:45:10', '2015-03-04 00:00:08']})
print(df)
I tried to do (but it's wrong):
index = 0
index2 = 1
df['New_Col_Target'] = 1
for i in range(0, len(df) -1):
for j in range(1, len(df)):
if(abs(pd.to_datetime(df['Time'].iloc[i]) -
pd.to_datetime(df['Time'].iloc[j])) >
pd.to_timedelta('0 day 06:00:00')):
# I don't know how to do the assignments
df['New_Col_Target'].iloc[i] = index
else:
# I don't know how to do the assignments
df['New_Col_Target'].iloc[i] = index2
index2 += 1
# New process
Date Init Date End Mean_Dig Mean_Analog
2015-02-01 00:00:00 2015-02-01 00:00:05 1 11
2015-02-01 07:45:07 2015-02-01 07:45:08 1 25
2015-02-01 07:45:08 2015-02-01 07:45:08 1 25
2015-03-04 00:00:08 2015-03-04 00:00:08 0 25
df_mean_group_New_Col_Target = pd.DataFrame({'Date Init': ['2015-02-01 00:00:00', '2015-02-01 07:45:07', '2015-02-01 07:45:08', '2015-03-04 00:00:08'],
'Date End': ['2015-02-01 00:00:05', '2015-02-01 07:45:08', '2015-02-01 07:45:08', '2015-03-04 00:00:08'],
'Mean_Data_Digital': [1, 1, 1, 0],
'Mean_Data_Analog': [11, 25, 25, 25]})
print(df_mean_group_New_Col_Target)
Upvotes: 2
Views: 37
Reputation: 42886
Use diff
, pd.Timedelta
and cumsum
:
df['New_col_target'] = (df['Time'].diff() > pd.Timedelta(hours=6)).cumsum().add(1)
Output
Data_Digital Data_Analog Time New_col_target
0 1 10 2015-02-01 00:00:00 1
1 1 12 2015-02-01 00:00:05 1
2 1 25 2015-02-01 07:45:07 2
3 1 25 2015-02-01 07:45:08 2
4 1 25 2015-02-01 21:45:10 3
5 0 25 2015-03-04 00:00:08 4
If your Time
column is not datetime
yet, use:
df['Time'] = pd.to_datetime(df['Time'])
Data_Digital
:We have to use groupby
:
m1 = df.groupby('Data_Digital')['Time'].diff().ge(pd.Timedelta(hours=6))
m2 = df['Data_Digital'].diff().ne(0)
df['New_col_target'] = (m1|m2).cumsum()
output
Data_Digital Data_Analog Time New_col_target
0 1 10 2015-02-01 00:00:00 1
1 1 12 2015-02-01 00:00:05 1
2 1 25 2015-02-01 07:45:07 2
3 1 25 2015-02-01 07:45:08 2
4 1 25 2015-02-01 21:45:10 3
5 0 25 2015-03-04 00:00:08 4
We can use groupby.mean
:
df.groupby('New_col_target',as_index=False)[['Data_Digital', 'Data_Analog']].mean()
Or
df.groupby('New_col_target',as_index=False).agg({'Data_Digital':'mean',
'Data_Analog':'mean'})
Or if you have pandas >= 0.25.0
(check with pd.__version__
), we can use named_agreggations
:
df.groupby('New_col_target').agg(
Digital_mean=('Data_Digital', 'mean'),
Analog_mean=('Data_Analog', 'mean')
).reset_index()
Output
New_col_target Data_Digital Data_Analog
0 1 1 11
1 2 1 25
2 3 1 25
3 4 0 25
Output named aggregations
New_col_target Digital_mean Analog_mean
0 1 1 11
1 2 1 25
2 3 1 25
3 4 0 25
Upvotes: 3