Reputation: 8077
Considering the following dataframe:
df = pd.read_json("""{"week":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1,"6":2,"7":2,"8":2,"9":2,"10":2,"11":2,"12":3,"13":3,"14":3,"15":3,"16":3,"17":3},"extra_hours":{"0":"01:00:00","1":"00:00:00","2":"01:00:00","3":"01:00:00","4":"00:00:00","5":"01:00:00","6":"01:00:00","7":"01:00:00","8":"01:00:00","9":"01:00:00","10":"00:00:00","11":"01:00:00","12":"01:00:00","13":"02:00:00","14":"01:00:00","15":"02:00:00","16":"00:00:00","17":"00:00:00"},"extra_hours_over":{"0":null,"1":null,"2":null,"3":null,"4":null,"5":null,"6":null,"7":null,"8":null,"9":null,"10":null,"11":null,"12":null,"13":null,"14":null,"15":null,"16":null,"17":null}}""")
df.tail(6)
week extra_hours extra_hours_over
12 3 01:00:00 NaN
13 3 02:00:00 NaN
14 3 01:00:00 NaN
15 3 02:00:00 NaN
16 3 00:00:00 NaN
17 3 00:00:00 NaN
Now, in every week, the maximum amount of extra_hours
is 4h, meaning I have to subtract 30min blocks from extra_hour
column, and fill the extra_hour_over
column, so that in every week, total sum of extra_hour
has a maximum of 4h.
So, given the example dataframe, a possible solution (for week 3) would be like this:
week extra_hours extra_hours_over
12 3 01:00:00 00:00:00
13 3 01:30:00 00:30:00
14 3 00:30:00 00:30:00
15 3 01:00:00 01:00:00
16 3 00:00:00 00:00:00
17 3 00:00:00 00:00:00
I would need to aggregate total extra_hours
per week, check in which days it passes 4h, and then randomly subtract half-hour chunks.
What would be the easiest/most direct way to achieve this?
Upvotes: 2
Views: 84
Reputation: 3752
Here goes one attempt for what you seem to be asking. The idea is simple, although the code fairly verbose:
1) Create some helper variables (minutes, extra_minutes, total for the week)
2) Loop through a temporary dataset that will contain only while sum is > 240 minutes.
3) In the loop, use random.choice
to select a time to remove 30 min from.
4) Apply the changes to minutes and extra minutes
The code:
df = pd.read_json("""{"week":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1,"6":2,"7":2,"8":2,"9":2,"10":2,"11":2,"12":3,"13":3,"14":3,"15":3,"16":3,"17":3},"extra_hours":{"0":"01:00:00","1":"00:00:00","2":"01:00:00","3":"01:00:00","4":"00:00:00","5":"01:00:00","6":"01:00:00","7":"01:00:00","8":"01:00:00","9":"01:00:00","10":"00:00:00","11":"01:00:00","12":"01:00:00","13":"02:00:00","14":"01:00:00","15":"02:00:00","16":"00:00:00","17":"00:00:00"},"extra_hours_over":{"0":null,"1":null,"2":null,"3":null,"4":null,"5":null,"6":null,"7":null,"8":null,"9":null,"10":null,"11":null,"12":null,"13":null,"14":null,"15":null,"16":null,"17":null}}""")
df['minutes'] = pd.DatetimeIndex(df['extra_hours']).hour * 60 + pd.DatetimeIndex(df['extra_hours']).minute
df['extra_minutes'] = 0
df['tot_time'] = df.groupby('week')['minutes'].transform('sum')
while not df[df['tot_time'] > 240].empty:
mask = df[(df['minutes']>=30)&(df['tot_time']>240)].groupby('week').apply(lambda x: np.random.choice(x.index)).values
df.loc[mask,'minutes'] -= 30
df.loc[mask,'extra_minutes'] += 30
df['tot_time'] = df.groupby('week')['minutes'].transform('sum')
df['extra_hours_over'] = df['extra_minutes'].apply(lambda x: pd.Timedelta(minutes=x))
df['extra_hours'] = df['minutes'].apply(lambda x: pd.Timedelta(minutes=x))
df.drop(['minutes','extra_minutes'], axis=1).tail(6)
Out[1]:
week extra_hours extra_hours_over tot_time
12 3 00:30:00 00:30:00 240
13 3 01:30:00 00:30:00 240
14 3 00:30:00 00:30:00 240
15 3 01:30:00 00:30:00 240
16 3 00:00:00 00:00:00 240
17 3 00:00:00 00:00:00 240
Note: Because I am using np.random.choice
, the same observation can be picked twice, which will make that observation change by a chunk of more than 30 min.
Upvotes: 1