Saras Arya
Saras Arya

Reputation: 3112

How to concatenate datetime instance to date in pandas?

I have a dataset which has a timestamp. Now I cannot take timestamp data into regression model as it would not allow that. Hence I wanted to concatenate the time stamp data, into particular dates and group the rows which fall on the same date. How do I go about doing that?

Example data set

print(processed_df.head())
                      date        day isWeekend  distance     time
15 2016-07-06 14:43:53.923    Tuesday     False     0.000  239.254
17 2016-07-07 09:24:53.928  Wednesday     False     0.000  219.191
18 2016-07-07 09:33:02.291  Wednesday     False     0.000  218.987
37 2016-07-14 22:03:23.355  Wednesday     False     0.636  205.000
46 2016-07-14 23:51:49.696  Wednesday     False     0.103  843.000

Now I would like the date to be index and all Wednesday rows can be combined to form a single row adding the distance and time.

My attempt on same.

print(new_df.groupby('date').mean().head())

                         distance     time
date                                      
2016-07-06 14:43:53.923       0.0  239.254
2016-07-07 09:24:53.928       0.0  219.191
2016-07-07 09:33:02.291       0.0  218.987
2016-07-07 11:28:26.920       0.0  519.016
2016-07-08 11:59:02.044       0.0  398.971

Which has failed.

Desired output

           distance time
    date                                      
2016-07-06 0.0 239.254
2016-07-07 0.0 957.194
2016-07-08 0.0 398.971

Upvotes: 1

Views: 49

Answers (1)

jezrael
jezrael

Reputation: 863611

I think you need groupby by dt.date:

#cast if dtype is not datetime
df.date = pd.to_datetime(df.date) 

print (df.groupby([df.date.dt.date])['distance', 'time'].mean())
            distance     time
date                         
2016-07-06    0.0000  239.254
2016-07-07    0.0000  219.089
2016-07-14    0.3695  524.000

Another solution with resample, but then need remove NaN rows by dropna:

print (df.set_index('date').resample('D')['distance', 'time'].mean())
            distance     time
date                         
2016-07-06    0.0000  239.254
2016-07-07    0.0000  219.089
2016-07-08       NaN      NaN
2016-07-09       NaN      NaN
2016-07-10       NaN      NaN
2016-07-11       NaN      NaN
2016-07-12       NaN      NaN
2016-07-13       NaN      NaN
2016-07-14    0.3695  524.000

print (df.set_index('date').resample('D')['distance', 'time'].mean().dropna())
            distance     time
date                         
2016-07-06    0.0000  239.254
2016-07-07    0.0000  219.089
2016-07-14    0.3695  524.000

Upvotes: 1

Related Questions