Reputation: 1397
I am learning numpy through exercices. I've got some trouble with this one. I've got to code a function which take a np_array as argument and return a new np_array. the argument look like :
>> log
array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')
Log contains info about connexion to a server. First element of each row is a login date and the second is the corresponding logout date.
The new np_array should return the number of hours where the server was connected, per weeks between, the monday preceding the first connection and the monday after the connection.
>> func(log)
array([[time_connected_week1,
time_connected_week2,
time_connected_week3,
...
time_connected_weekn]], dtype='timedelta64[s]'
week1 (weekn) must fit the first (last) week of the log array.
I have written the following code:
def func(log):
begin = np.datetime64("2015-05-04") # first monday
end = np.datetime64("2015-12-07") # last monday
week_td64 = np.timedelta64(1, 'W')
nbWeek_td64 = int((end - begin) / week_td64)
week = begin + np.arange(nbWeek_td64) * week_td64 # arange(week1, weekn)
weekHours = [] # list to store return values
for w in week:
mask1 = log[:,0] > w
mask2 = log[:,0] < w + week_td64
l = log[mask1 & mask2] # get log row matching the current week
totalweek = (l[:,1] - l[:,0]).sum() #compute sum of result
weekHours.append(totalweek) #save value
return np.array(weekHours)
I've got two questions concerning my code:
1/ how can I find the first monday automaticaly ? np.datetime64 does not support weekday(). Do I have to use datetime.datetime ?
2/ How can I get rid of the loop ? I've been said that numpy was a lot about getting rid of loop. I am sure we can do this with fancy slicing.
Upvotes: 0
Views: 246
Reputation: 2728
Sorry I missed it out. Actually there is a much easier way to know which week the log entry belongs without np.tile and np.repeat.
The only thing you have to do is to calculate the timedelta64 from the beginning monday, then you will have the week it belongs:
log = np.array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')
login = log[:,0]
logoff = log[:,1]
begin = GetMonday(np.min(login))
end = GetMonday(np.max(logoff), True)
n_logs = log.shape[0]*1.0
week_td64 = np.timedelta64(1, 'W')
weeks_entry = np.floor((login-begin)/week_td64)
hours_spent = (logoff-login).astype('timedelta64[h]')
print np.bincount(weeks_entry.astype('int64'), hours_spent.astype('float64'))
#[ 14. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 36.
# 0. 0. 0. 0. 0. 0. 0. 0. 0. 7. 0. 0. 0. 0. 0.
# 8.]
Upvotes: 0
Reputation: 2728
For the first question about getting first monday automatically, you could use busday_offset to do so defining a weekday mask to consider only mondays to be busdays:
firstDay = np.min(log[:, 0])
firstMonday = first_monday(firstDay)
def first_monday(firstDay):
firstEntry = firstDay.astype('M8[D]')
beforeMonday = np.busday_offset(firstEntry, -1, 'forward', [1,0,0,0,0,0,0])
if firstEntry - beforeMonday == np.timedelta64(7, 'D'):
return firstEntry
else:
return beforeMonday
Tip: you can get rid of loop by np.tile() the log and np.repeat() the week.
FINAL ANSWER: Don't read unless you give up.
First define a GetMonday function:
def GetMonday(firstDay, forward=False):
firstEntry = firstDay.astype('M8[D]')
beforeMonday = np.busday_offset(firstEntry, forward*2-1, 'forward', [1,0,0,0,0,0,0])
if abs(firstEntry-beforeMonday) == np.timedelta64(7, 'D'):
return firstEntry.astype('M8[s]')
else:
return beforeMonday.astype('M8[s]')
Then you can code:
log = np.array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')
login = log[:,0]
logoff = log[:,1]
begin = GetMonday(np.min(login))
end = GetMonday(np.max(logoff), True)
n_logs = log.shape[0]*1.0
week_td64 = np.timedelta64(1, 'W')
nbWeek_td64 = int((end - begin) / week_td64)
week = begin + np.arange(nbWeek_td64) * week_td64
tiledLogin = np.tile(login, nbWeek_td64)
repeatedWeek = np.repeat(week, n_logs)
repeatedWeek_order = np.repeat(np.arange(nbWeek_td64), n_logs)
loginWeekMask = (tiledLogin >= repeatedWeek) & (tiledLogin < repeatedWeek+np.timedelta64(1,'W'))
hours_spent = (logoff-login).astype('timedelta64[h]')
weeks_entry = repeatedWeek_order[loginWeekMask]
print np.bincount(weeks_entry.astype('int64'), hours_spent.astype('float64'))
#[ 14. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 36.
# 0. 0. 0. 0. 0. 0. 0. 0. 0. 7. 0. 0. 0. 0. 0.
# 8.]
This will get you an array with the hours by week. It is not the right final answer as you may have logoff-login which will occur across more than one week, but I will leave it for you to figure a way out.
Upvotes: 1