Reputation: 173
I have dataframe:
data = {'startTime':['01-06-2010 09:00:00', '13-02-2016 09:00:00', '18-03-2018 09:00:00', '23-05-2011 09:00:00'], 'endTime':['02-06-2010 17:00:00', '14-02-2016 17:00:00', '19-03-2018 17:00:00', '24-05-2011 17:00:00'], 'durationInMinutes': [1440, 1440, 1440, 1440]}
df = pd.DataFrame(data)
I want to make additional rows by dividing 1440 minutes into equal intervals of 8 hours per day. So, 1440 minutes would be equal to 3 days (3 additional rows) (9am - 5pm). Minutes can be more than 1440. New rows in startTime and endTime would be:
startTime endTime
01-06-2010 09:00:00 01-06-2010 17:00:00
02-06-2010 09:00:00 02-06-2010 17:00:00
03-06-2010 09:00:00 03-06-2010 17:00:00
13-02-2016 09:00:00 13-02-2016 17:00:00
14-02-2016 09:00:00 14-02-2016 17:00:00
15-02-2016 09:00:00 15-02-2016 17:00:00
Can anyone please help me with this? Thanks.
Upvotes: 0
Views: 147
Reputation: 9379
UPDATE #2:
Here is a way to do what the question asks, as clarified by multiple OP comments, which is to:
durationInMinutes
column value in the input.durationInMinutes
for each result row to match the above logic.startTime
of each result row is 09:00
on the appropriate date.endTime
of each result row is a period of durationInMinutes
later than startTime
to match the above logic.df['days'] = df.durationInMinutes // (8 * 60) + (df.durationInMinutes % (8 * 60) > 0)
df['durationInMinutes'] = df.apply(lambda x: [8 * 60] * (x.days - 1) +
[x.durationInMinutes % (8 * 60) + (x.durationInMinutes % (8 * 60) == 0) * 8 * 60], axis=1)
df['daysToAdd'] = df.days.apply(lambda x: range(x))
df = df.explode(['durationInMinutes', 'daysToAdd'])
df.startTime = pd.to_datetime(df.startTime, dayfirst=True)
df.startTime = pd.to_datetime(pd.DataFrame({
'year':df.startTime.dt.year, 'month':df.startTime.dt.month,
'day':df.startTime.dt.day + df.daysToAdd,
'hour':[9]*len(df.index)}))
df.endTime = (df.startTime.astype('int64') +
df.durationInMinutes * 60*1_000_000_000).astype('datetime64[ns]')
df = df.drop(columns=['days', 'daysToAdd']).reset_index(drop=True)
Input:
startTime endTime durationInMinutes
0 01-06-2010 09:00:00 02-06-2010 17:00:00 475
1 13-02-2016 08:30:00 14-02-2016 17:00:00 510
2 18-03-2018 09:30:00 19-03-2018 17:00:00 1440
3 23-05-2011 09:00:00 24-05-2011 17:00:00 1440
Output:
startTime endTime durationInMinutes
0 2010-06-01 09:00:00 2010-06-01 16:55:00 475
1 2016-02-13 09:00:00 2016-02-13 17:00:00 480
2 2016-02-14 09:00:00 2016-02-14 09:30:00 30
3 2018-03-18 09:00:00 2018-03-18 17:00:00 480
4 2018-03-19 09:00:00 2018-03-19 17:00:00 480
5 2018-03-20 09:00:00 2018-03-20 17:00:00 480
6 2011-05-23 09:00:00 2011-05-23 17:00:00 480
7 2011-05-24 09:00:00 2011-05-24 17:00:00 480
8 2011-05-25 09:00:00 2011-05-25 17:00:00 480
UPDATED: Here is a way to do what you have asked in your question and clarified in your comments:
df['days'] = df.durationInMinutes // (8 * 60) +
(df.durationInMinutes % (8 * 60) > 0).astype(int)
df['durationInMinutes'] = df.apply(lambda x: [8 * 60] * (x.days - 1) +
[x.durationInMinutes % (8 * 60) +
(x.durationInMinutes % (8 * 60) == 0) * 8 * 60], axis=1)
df['daysToAdd'] = df.days.apply(lambda x: range(x))
df = df.explode(['durationInMinutes', 'daysToAdd'])
df.startTime = (pd.to_datetime(df.startTime, dayfirst=True).astype('int64') +
df.daysToAdd * 24*60*60*1_000_000_000).astype('datetime64[ns]')
df.endTime = (df.startTime.astype('int64') +
df.durationInMinutes * 60*1_000_000_000).astype('datetime64[ns]')
df = df.drop(columns=['days', 'daysToAdd']).reset_index(drop=True)
Explanation:
days
with number of result rows corresponding to each input row.durationInMinutes
column to contain a list for each row of the minutes in each result row originating from the input row.daysToAdd
with a list for each row of the number of days to add to the startTime for each result row originating from the input row.explode()
to create result rows with one value each from the lists in durationInMinutes
and daysToAdd
.daysToAdd
to startTime
.endTime
to be startTime
plus the nanosecond equivalent of durationInMinutes
.drop()
to eliminate unneeded columns, and use reset_index()
to get an integer range index starting at 0 that advances by 1 for each row.Input:
startTime endTime durationInMinutes
0 01-06-2010 09:00:00 02-06-2010 17:00:00 1445
1 13-02-2016 09:00:00 14-02-2016 17:00:00 1435
2 18-03-2018 09:00:00 19-03-2018 17:00:00 1440
3 23-05-2011 09:00:00 24-05-2011 17:00:00 1440
Output:
startTime endTime durationInMinutes
0 2010-06-01 09:00:00 2010-06-01 17:00:00 480
1 2010-06-02 09:00:00 2010-06-02 17:00:00 480
2 2010-06-03 09:00:00 2010-06-03 17:00:00 480
3 2010-06-04 09:00:00 2010-06-04 09:05:00 5
4 2016-02-13 09:00:00 2016-02-13 17:00:00 480
5 2016-02-14 09:00:00 2016-02-14 17:00:00 480
6 2016-02-15 09:00:00 2016-02-15 16:55:00 475
7 2018-03-18 09:00:00 2018-03-18 17:00:00 480
8 2018-03-19 09:00:00 2018-03-19 17:00:00 480
9 2018-03-20 09:00:00 2018-03-20 17:00:00 480
10 2011-05-23 09:00:00 2011-05-23 17:00:00 480
11 2011-05-24 09:00:00 2011-05-24 17:00:00 480
12 2011-05-25 09:00:00 2011-05-25 17:00:00 480
ORIGINAL ANSWER:
Here's a way to do what your question asks:
df = pd.concat([df.assign(durationInMinutes=df.durationInMinutes/3,
orig_row=i).reset_index() for i in range(3)])
for col in ['startTime', 'endTime']:
df[col] = (pd.to_datetime(df[col], dayfirst=True).astype('int64') +
df.orig_row * 24*60*60*1_000_000_000).astype('datetime64[ns]')
df = df.sort_values('index').drop(columns=['index', 'orig_row'])
Explanation:
durationInMinutes
column by dividing its contents by 3.df
, each with a new column orig_row
containing an integer corresponding to which number the copy is (0, 1 or 2).startTime
and endTime
, convert the string value to a datetime in nanoseconds and add the nanosecond equivalent of the number of days in orig_row
to it (24 hours * 60 minutes * 60 seconds * 1bn nanoseconds
).Input:
startTime endTime durationInMinutes
0 01-06-2010 09:00:00 02-06-2010 17:00:00 1440
1 13-02-2016 09:00:00 14-02-2016 17:00:00 1440
2 18-03-2018 09:00:00 19-03-2018 17:00:00 1440
3 23-05-2011 09:00:00 24-05-2011 17:00:00 1440
Output:
startTime endTime durationInMinutes
0 2010-06-01 09:00:00 2010-06-02 17:00:00 480.0
0 2010-06-02 09:00:00 2010-06-03 17:00:00 480.0
0 2010-06-03 09:00:00 2010-06-04 17:00:00 480.0
1 2016-02-13 09:00:00 2016-02-14 17:00:00 480.0
1 2016-02-14 09:00:00 2016-02-15 17:00:00 480.0
1 2016-02-15 09:00:00 2016-02-16 17:00:00 480.0
2 2018-03-18 09:00:00 2018-03-19 17:00:00 480.0
2 2018-03-19 09:00:00 2018-03-20 17:00:00 480.0
2 2018-03-20 09:00:00 2018-03-21 17:00:00 480.0
3 2011-05-23 09:00:00 2011-05-24 17:00:00 480.0
3 2011-05-24 09:00:00 2011-05-25 17:00:00 480.0
3 2011-05-25 09:00:00 2011-05-26 17:00:00 480.0
TROUBLESHOOTING:
IN a comment, OP mentions getting ValueError: columns must have matching element counts
in df.explode(). A print(df)
just before the explode()
line in the UPDATED
solution in my environment gives:
startTime endTime durationInMinutes days daysToAdd
0 01-06-2010 09:00:00 02-06-2010 17:00:00 [480, 480, 480, 5] 4 (0, 1, 2, 3)
1 13-02-2016 09:00:00 14-02-2016 17:00:00 [480, 480, 475] 3 (0, 1, 2)
2 18-03-2018 09:00:00 19-03-2018 17:00:00 [480, 480, 480] 3 (0, 1, 2)
3 23-05-2011 09:00:00 24-05-2011 17:00:00 [480, 480, 480] 3 (0, 1, 2)
Upvotes: 1
Reputation: 260640
IIUC, you can use:
# ensure datetime
df[['startTime', 'endTime']] = df[['startTime', 'endTime']].apply(pd.to_datetime, dayfirst=True)
# compute number of rows in days
extra = np.ceil(df['durationInMinutes'].div(60*8)).astype(int)
# compute a shift (+0, +1, +2days etc.)
shift = extra.repeat(extra).groupby(level=0).cumcount().mul(pd.Timedelta('1day'))
# duplicate the rows
df2 = df.loc[df.index.repeat(extra)].reset_index(drop=True)
# add the shift
df2[['startTime', 'endTime']] = df2[['startTime', 'endTime']].add(shift.values, axis=0)
print(df2)
output:
startTime endTime durationInMinutes
0 2010-06-01 09:00:00 2010-06-02 17:00:00 1440
1 2010-06-02 09:00:00 2010-06-03 17:00:00 1440
2 2010-06-03 09:00:00 2010-06-04 17:00:00 1440
3 2016-02-13 09:00:00 2016-02-14 17:00:00 1440
4 2016-02-14 09:00:00 2016-02-15 17:00:00 1440
5 2016-02-15 09:00:00 2016-02-16 17:00:00 1440
6 2018-03-18 09:00:00 2018-03-19 17:00:00 1440
7 2018-03-19 09:00:00 2018-03-20 17:00:00 1440
8 2018-03-20 09:00:00 2018-03-21 17:00:00 1440
9 2011-05-23 09:00:00 2011-05-24 17:00:00 1440
10 2011-05-24 09:00:00 2011-05-25 17:00:00 1440
11 2011-05-25 09:00:00 2011-05-26 17:00:00 1440
Upvotes: 1