Miguel Lebon
Miguel Lebon

Reputation: 49

Duplicate rows with a new column

So I have a dataframe like this

df1 = {'id':[123,347],
   'shift_start':['2022-01-25 09:00:00', '2022-01-28 19:00:00'],
   'shift_end':['2022-01-25 12:30:00', '2022-01-29 00:00:00']}

df1 = pd.DataFrame(df1,columns=['id','shift_start', 'shift_end'])
id  shift_start         shift_end
123 2022-01-25 09:00:00 2022-01-28 12:30:00
347 2022-01-25 23:00:00 2022-01-29 00:00:00

And I would like to get for every id how many 30min slots I have in each shift, so something like this

id  shift_start         shift_end           slot_start          slot_end
123 2022-01-25 09:00:00 2022-01-28 12:30:00 2022-01-25 09:00:00 2022-01-25 09:30:00
123 2022-01-25 09:00:00 2022-01-28 12:30:00 2022-01-25 09:30:00 2022-01-25 10:00:00
123 2022-01-25 09:00:00 2022-01-28 12:30:00 2022-01-25 10:00:00 2022-01-25 10:30:00
123 2022-01-25 09:00:00 2022-01-28 12:30:00 2022-01-25 10:30:00 2022-01-25 11:00:00
123 2022-01-25 09:00:00 2022-01-28 12:30:00 2022-01-25 11:00:00 2022-01-25 11:30:00
123 2022-01-25 09:00:00 2022-01-28 12:30:00 2022-01-25 11:30:00 2022-01-25 12:00:00
123 2022-01-25 09:00:00 2022-01-28 12:30:00 2022-01-25 12:00:00 2022-01-25 12:30:00
347 2022-01-25 23:00:00 2022-01-29 00:00:00 2022-01-25 23:00:00 2022-01-29 23:30:00
347 2022-01-25 23:00:00 2022-01-29 00:00:00 2022-01-25 23:30:00 2022-01-29 00:00:00

I've found a way to find the 30min slots but I still cannot find the way to add them into the dataframe

def datetime_range(start, end, delta):
    current = start
    while current < end:
        yield current
        current += delta

for index, row in df1.iterrows():
    dts = [dt.strftime('%Y-%m-%d %H:%M:%S') for dt in
        datetime_range(datetime(row['shift_start'].year, row['shift_start'].month, row['shift_start'].day, row['shift_start'].hour, row['shift_start'].minute),
                       datetime(row['shift_end'].year, row['shift_end'].month, row['shift_end'].day, row['shift_end'].hour, row['shift_end'].minute),
        timedelta(minutes=30))]

    print(dts)

Any help will be very welcome. Thanks!

Upvotes: 1

Views: 57

Answers (1)

fsl
fsl

Reputation: 3280

Here is one way:

slots = df1[['shift_start', 'shift_end']]
    .apply(lambda x: pd.date_range(*x, freq='30min'), axis=1)
    .explode()
    .rename('slot_start')
    .to_frame()

slots['slot_end'] = slots['slot_start'].shift(-1)

df1 = pd.concat([df1, slots], axis=1).dropna()
df1 = df1[(df1.shift_end >= df1.slot_end)]                                                     

Output:

   id          shift_start            shift_end          slot_start            slot_end
0  123  2022-01-25 09:00:00  2022-01-25 12:30:00 2022-01-25 09:00:00 2022-01-25 09:30:00
0  123  2022-01-25 09:00:00  2022-01-25 12:30:00 2022-01-25 09:30:00 2022-01-25 10:00:00
0  123  2022-01-25 09:00:00  2022-01-25 12:30:00 2022-01-25 10:00:00 2022-01-25 10:30:00
0  123  2022-01-25 09:00:00  2022-01-25 12:30:00 2022-01-25 10:30:00 2022-01-25 11:00:00
0  123  2022-01-25 09:00:00  2022-01-25 12:30:00 2022-01-25 11:00:00 2022-01-25 11:30:00
0  123  2022-01-25 09:00:00  2022-01-25 12:30:00 2022-01-25 11:30:00 2022-01-25 12:00:00
0  123  2022-01-25 09:00:00  2022-01-25 12:30:00 2022-01-25 12:00:00 2022-01-25 12:30:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 19:00:00 2022-01-28 19:30:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 19:30:00 2022-01-28 20:00:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 20:00:00 2022-01-28 20:30:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 20:30:00 2022-01-28 21:00:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 21:00:00 2022-01-28 21:30:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 21:30:00 2022-01-28 22:00:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 22:00:00 2022-01-28 22:30:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 22:30:00 2022-01-28 23:00:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 23:00:00 2022-01-28 23:30:00
1  347  2022-01-28 19:00:00  2022-01-29 00:00:00 2022-01-28 23:30:00 2022-01-29 00:00:00

Upvotes: 2

Related Questions