Shravan K
Shravan K

Reputation: 105

Groupby a column value for 2 hour

I have a data frame with columns Date, Time, Distance. I want to groupby distance column for each day with 2 hours interval. I have checked the question asked in stack overflow, but it is not giving me the required output.

The sample code I have pasted is from stack overflow, but the output is not as desired.

uniquedate = np.unique(df2["Date"])
for i in uniquedate:
    data_new1 = df2[df2['Date']==i]
    data_new2 = data_new1[['Real_time', 'Distance_km']].copy()
    data_new2.index = pd.to_timedelta(data_new2.index.astype(str))
    data_new2 = data_new2.resample('2H').sum()
    print(data_new2)

                 Distance_km
00:00:00.000000   276.850844
                 Distance_km
00:00:00.000000   193.115217

I got this output but this is not as required

Upvotes: 1

Views: 199

Answers (2)

user12029183
user12029183

Reputation:

Try this:

data_new2.groupby([data_new2.index,pd.TimeGrouper(freq='2H')]).sum()

Please avoid copy pasting code from stack overflow. More often than not the solution will not suffice for your use case. Plus it's lazy.

EDIT Make your index as datetime or timestamp aand then doing a simple resample should work for you.

import pandas as pd
idx = pd.date_range('2018-01-01', periods=5, freq='H')
ts = pd.Series(range(len(idx)), index=idx)
ts = ts.resample('2H').sum()

Upvotes: 1

SpghttCd
SpghttCd

Reputation: 10880

I don't exactly know if I get you right - mainly because you do not post an example of your data.

However, here's an mcve:

# dtrg = pd.date_range('01.Jan.1999 21:17', '02.Jan.1999 04:23', freq='10min')
# df = pd.DataFrame({'distance': np.random.randint(1, 5, len(dtrg)) * 10}, index=dtrg)

# df.head()
#                      distance
# 1999-01-01 21:17:00        30
# 1999-01-01 21:27:00        40
# 1999-01-01 21:37:00        10
# 1999-01-01 21:47:00        30
# 1999-01-01 21:57:00        30

with this data you could simply group by the datetime index in 2H-slices and sum up the distance:

df.groupby(pd.Grouper(freq='2H')).sum()

                     distance
1999-01-01 20:00:00       140
1999-01-01 22:00:00       340
1999-01-02 00:00:00       370
1999-01-02 02:00:00       300
1999-01-02 04:00:00        40

...and if your dataframe doesn't have a datetime index but date and time as separate string values, you should transform them to datetime and set that as index before:

example dataframe structure:

         date   time  distance
0  01.01.1999  09:00        10

transform date and time columns to datetime index:

df.set_index(pd.to_datetime(df.date.str.cat(df.time, sep=' '))).distance

date
1999-01-01 09:00:00    10
Name: distance, dtype: int64

Upvotes: 0

Related Questions