Reputation: 3112
I have a dataset which has a timestamp. Now I cannot take timestamp data into regression model as it would not allow that. Hence I wanted to concatenate the time stamp data, into particular dates and group the rows which fall on the same date. How do I go about doing that?
Example data set
print(processed_df.head())
date day isWeekend distance time
15 2016-07-06 14:43:53.923 Tuesday False 0.000 239.254
17 2016-07-07 09:24:53.928 Wednesday False 0.000 219.191
18 2016-07-07 09:33:02.291 Wednesday False 0.000 218.987
37 2016-07-14 22:03:23.355 Wednesday False 0.636 205.000
46 2016-07-14 23:51:49.696 Wednesday False 0.103 843.000
Now I would like the date to be index and all Wednesday rows can be combined to form a single row adding the distance and time.
My attempt on same.
print(new_df.groupby('date').mean().head())
distance time
date
2016-07-06 14:43:53.923 0.0 239.254
2016-07-07 09:24:53.928 0.0 219.191
2016-07-07 09:33:02.291 0.0 218.987
2016-07-07 11:28:26.920 0.0 519.016
2016-07-08 11:59:02.044 0.0 398.971
Which has failed.
Desired output
distance time
date
2016-07-06 0.0 239.254
2016-07-07 0.0 957.194
2016-07-08 0.0 398.971
Upvotes: 1
Views: 49
Reputation: 863611
I think you need groupby
by dt.date
:
#cast if dtype is not datetime
df.date = pd.to_datetime(df.date)
print (df.groupby([df.date.dt.date])['distance', 'time'].mean())
distance time
date
2016-07-06 0.0000 239.254
2016-07-07 0.0000 219.089
2016-07-14 0.3695 524.000
Another solution with resample
, but then need remove NaN
rows by dropna
:
print (df.set_index('date').resample('D')['distance', 'time'].mean())
distance time
date
2016-07-06 0.0000 239.254
2016-07-07 0.0000 219.089
2016-07-08 NaN NaN
2016-07-09 NaN NaN
2016-07-10 NaN NaN
2016-07-11 NaN NaN
2016-07-12 NaN NaN
2016-07-13 NaN NaN
2016-07-14 0.3695 524.000
print (df.set_index('date').resample('D')['distance', 'time'].mean().dropna())
distance time
date
2016-07-06 0.0000 239.254
2016-07-07 0.0000 219.089
2016-07-14 0.3695 524.000
Upvotes: 1