Reputation: 1815
Is this the most preferred method to obtain standard deviation and mean of times based on the day of the week?
How do I group the mean time, standard deviations first by TargetName, and second by day_of_week?
Also, how would I go about converting the series of standard deviations and means to proper time format? I have tried to loop through the series' and do datetime.timedelta(seconds=item) with success but would prefer the more pandas way to conduct operations. Thank you for your feedback.
I have a data set that has date time stamps in it as below:
Date Time TargetUser
10/10/2012 20:20:01 joe
10/11/2012 02:20:01 bob
10/13/2012 21:20:01 smo
10/16/2012 22:20:01 joe
I am creating a day of week column as below:
df['my_dates'] = pd.to_datetime(df['Date'])
df['day_of_week'] = df['my_dates'].dt.dayofweek
days = {dict of days of week ie 0:"Mon"}
df['day_of_week'] = df['day_of_week'].apply(lambda x: days[x])
I am creating columns to tally up a total of seconds in a day and creating a column:
df[['HH', 'MM','SS']] = df['Time'].str.split(':', expand=True)
df['seconds'] = (((df['HH'].astype(int) * 60) + df['MM'].astype(int)) * 60) + df['SS'].astype(int)
I am then identifying a mean Time and standard deviation by day of week via below:
meantime = df['seconds'].groupby([df['day_of_week']]).mean()
std = df['seconds'].groupby([df['day_of_week']]).std(ddof=1)
(Not based on above data) Expected Output:
Name Day_of_week Mean STD
joe mon 15:01:01 00:08:02
tue 10:01:01 00:01:06
bob mon 11:11:11 00:20:30
smo thur 07:07:07 00:03:02
Upvotes: 0
Views: 3715
Reputation: 61967
You should be able to greatly simplify your work by concatenating Date and Time and then using pandas excellent datetime accessor dt
.
df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
df['day_of_week'] = df.DateTime.dt.strftime('%a')
df['seconds'] = pd.to_timedelta(df.DateTime.dt.time.astype(str)).dt.seconds
Which gives you this
Date Time TargetUser DateTime day_of_week seconds
0 10/10/2012 20:20:01 joe 2012-10-10 20:20:01 Wed 73201
1 10/11/2012 02:20:01 bob 2012-10-11 02:20:01 Thu 8401
2 10/13/2012 21:20:01 smo 2012-10-13 21:20:01 Sat 76801
3 10/16/2012 22:20:01 joe 2012-10-16 22:20:01 Tue 80401
And then to group by user and day of week do the following which renames your columns as well.
df1 = df.groupby(['TargetUser', 'day_of_week'])\
.agg({'seconds':{'mean': lambda x: pd.to_timedelta(x.mean(), 's'),
'std': lambda x: pd.to_timedelta(np.std(x, ddof=1))}})
Final output of df1
seconds
mean std
TargetUser day_of_week
bob Thu 02:20:01 NaT
joe Tue 22:20:01 NaT
Wed 20:20:01 NaT
smo Sat 21:20:01 NaT
To remove the upper column level and turn the index into columns you can then do this:
df1.columns = df1.columns.droplevel()
df1.reset_index()
Upvotes: 2