Reputation: 296
I am working on a dataframe and I am in a situation where I need to group together the rows based on the value of the index. The index is hourly timestamp, but it happens that some specific hours are not in the dataframe (because they do not satisfy a specific condition). So I need to group together all the continuous hours, and when a specific hour is missing another group should be created.
The image below describes what I want to achieve:
Timestamp Value
1/2/2017 1:00 231.903601
1/2/2017 2:00 228.225897
1/2/2017 7:00 211.998416
1/2/2017 8:00 227.219204
1/2/2017 9:00 229.203123
1/3/2017 6:00 237.907033
1/3/2017 7:00 206.684276
1/3/2017 8:00 228.4801
The output should be (Starting-ending date and the average value):
Timestamp Avg_Value
1/2/2017 1:00-1/2/2017 2:00 230.06
1/2/2017 7:00-1/2/2017 9:00 222.8
1/3/2017 6:00-1/3/2017 8:00 224.35
Could you please help me with a way, do do this with Python dataframes? Thank you,
Upvotes: 1
Views: 120
Reputation: 59549
First convert to a Timestamp
.
Then form groups by taking the cumulative sum of a Series that checks if the time difference is not 1 Hour. Use .agg
to get the relevant calculations for each column.
import pandas as pd
df['Timestamp'] = pd.to_datetime(df.Timestamp, format='%m/%d/%Y %H:%M')
s = df.Timestamp.diff().bfill().dt.total_seconds().ne(3600).cumsum()
df.groupby(s).agg({'Timestamp': ['min', 'max'], 'Value': 'mean'}).rename_axis(None, 0)
Timestamp Value
min max mean
0 2017-01-02 01:00:00 2017-01-02 02:00:00 230.064749
1 2017-01-02 07:00:00 2017-01-02 09:00:00 222.806914
2 2017-01-03 06:00:00 2017-01-03 08:00:00 224.357136
Upvotes: 2