onr
onr

Reputation: 296

Groupby, in dataframe based on the index value (hourly timestamp) - when the index is interrupted

I am working on a dataframe and I am in a situation where I need to group together the rows based on the value of the index. The index is hourly timestamp, but it happens that some specific hours are not in the dataframe (because they do not satisfy a specific condition). So I need to group together all the continuous hours, and when a specific hour is missing another group should be created.

The image below describes what I want to achieve:

enter image description here

Timestamp   Value
1/2/2017 1:00   231.903601
1/2/2017 2:00   228.225897
1/2/2017 7:00   211.998416
1/2/2017 8:00   227.219204
1/2/2017 9:00   229.203123
1/3/2017 6:00   237.907033
1/3/2017 7:00   206.684276
1/3/2017 8:00   228.4801

The output should be (Starting-ending date and the average value):

     Timestamp                    Avg_Value
    1/2/2017 1:00-1/2/2017 2:00    230.06
    1/2/2017 7:00-1/2/2017 9:00    222.8
    1/3/2017 6:00-1/3/2017 8:00    224.35

Could you please help me with a way, do do this with Python dataframes? Thank you,

Upvotes: 1

Views: 120

Answers (1)

ALollz
ALollz

Reputation: 59549

First convert to a Timestamp.

Then form groups by taking the cumulative sum of a Series that checks if the time difference is not 1 Hour. Use .agg to get the relevant calculations for each column.

import pandas as pd

df['Timestamp'] = pd.to_datetime(df.Timestamp, format='%m/%d/%Y %H:%M')
s = df.Timestamp.diff().bfill().dt.total_seconds().ne(3600).cumsum()

df.groupby(s).agg({'Timestamp': ['min', 'max'], 'Value': 'mean'}).rename_axis(None, 0)

Output:

            Timestamp                           Value
                  min                 max        mean
0 2017-01-02 01:00:00 2017-01-02 02:00:00  230.064749
1 2017-01-02 07:00:00 2017-01-02 09:00:00  222.806914
2 2017-01-03 06:00:00 2017-01-03 08:00:00  224.357136

Upvotes: 2

Related Questions