Reputation: 105
I have a data frame with columns Date, Time, Distance. I want to groupby distance column for each day with 2 hours interval. I have checked the question asked in stack overflow, but it is not giving me the required output.
The sample code I have pasted is from stack overflow, but the output is not as desired.
uniquedate = np.unique(df2["Date"])
for i in uniquedate:
data_new1 = df2[df2['Date']==i]
data_new2 = data_new1[['Real_time', 'Distance_km']].copy()
data_new2.index = pd.to_timedelta(data_new2.index.astype(str))
data_new2 = data_new2.resample('2H').sum()
print(data_new2)
Distance_km
00:00:00.000000 276.850844
Distance_km
00:00:00.000000 193.115217
I got this output but this is not as required
Upvotes: 1
Views: 199
Reputation:
Try this:
data_new2.groupby([data_new2.index,pd.TimeGrouper(freq='2H')]).sum()
Please avoid copy pasting code from stack overflow. More often than not the solution will not suffice for your use case. Plus it's lazy.
EDIT Make your index as datetime or timestamp aand then doing a simple resample should work for you.
import pandas as pd
idx = pd.date_range('2018-01-01', periods=5, freq='H')
ts = pd.Series(range(len(idx)), index=idx)
ts = ts.resample('2H').sum()
Upvotes: 1
Reputation: 10880
I don't exactly know if I get you right - mainly because you do not post an example of your data.
However, here's an mcve:
# dtrg = pd.date_range('01.Jan.1999 21:17', '02.Jan.1999 04:23', freq='10min')
# df = pd.DataFrame({'distance': np.random.randint(1, 5, len(dtrg)) * 10}, index=dtrg)
# df.head()
# distance
# 1999-01-01 21:17:00 30
# 1999-01-01 21:27:00 40
# 1999-01-01 21:37:00 10
# 1999-01-01 21:47:00 30
# 1999-01-01 21:57:00 30
with this data you could simply group by the datetime index in 2H-slices and sum up the distance:
df.groupby(pd.Grouper(freq='2H')).sum()
distance
1999-01-01 20:00:00 140
1999-01-01 22:00:00 340
1999-01-02 00:00:00 370
1999-01-02 02:00:00 300
1999-01-02 04:00:00 40
...and if your dataframe doesn't have a datetime index but date and time as separate string values, you should transform them to datetime and set that as index before:
example dataframe structure:
date time distance
0 01.01.1999 09:00 10
transform date and time columns to datetime index:
df.set_index(pd.to_datetime(df.date.str.cat(df.time, sep=' '))).distance
date
1999-01-01 09:00:00 10
Name: distance, dtype: int64
Upvotes: 0