Is there a faster way to iterate through a DataFrame?

Question

I am going through a Pandas DataFrame of time slots, trying to compare each time slot to the other slots of the day to find double bookings.

The script takes a while to run. Is there a faster way to do this?

df_temp = pd.DataFrame()
for date in df_cal["date"].unique():
    df_date = df_cal[df_cal["date"]==date]
    for current in range(len(df_date)):
        for comp in range(current+1,df_date[df_date["Start"]



Columns are [["MEET_ID","date","Start","End","double_booked","Time_removed"]]

[[1943,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 09:00:00'),
  Timestamp('2017-05-01 09:30:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1907,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 10:00:00'),
  Timestamp('2017-05-01 11:00:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1913,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 11:00:00'),
  Timestamp('2017-05-01 12:00:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1956,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 12:00:00'),
  Timestamp('2017-05-01 12:30:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1905,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 12:30:00'),
  Timestamp('2017-05-01 13:00:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1914,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 12:30:00'),
  Timestamp('2017-05-01 13:00:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1940,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 13:00:00'),
  Timestamp('2017-05-01 16:00:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1958,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 14:30:00'),
  Timestamp('2017-05-01 15:30:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1892,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 16:00:00'),
  Timestamp('2017-05-01 16:30:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1929,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 16:30:00'),
  Timestamp('2017-05-01 17:00:00'),
  False,
  Timedelta('0 days 00:00:00')],
 [1887,
  Timestamp('2017-05-01 00:00:00'),
  Timestamp('2017-05-01 17:30:00'),
  Timestamp('2017-05-01 18:00:00'),
  False,
  Timedelta('0 days 00:00:00')]]


Which should then yield something like this, where double booked meetings are marked as such and the overlapping time is removed from one of the meetings (here it is removed from the second one)
Columns are [["MEET_ID","Start","End","Time_removed","double_booked"]]

[[1943,
  Timestamp('2017-05-01 09:00:00'),
  Timestamp('2017-05-01 09:30:00'),
  Timedelta('0 days 00:00:00'),
  False],
 [1907,
  Timestamp('2017-05-01 10:00:00'),
  Timestamp('2017-05-01 11:00:00'),
  Timedelta('0 days 00:00:00'),
  False],
 [1913,
  Timestamp('2017-05-01 11:00:00'),
  Timestamp('2017-05-01 12:00:00'),
  Timedelta('0 days 00:00:00'),
  False],
 [1956,
  Timestamp('2017-05-01 12:00:00'),
  Timestamp('2017-05-01 12:30:00'),
  Timedelta('0 days 00:00:00'),
  False],
 [1905,
  Timestamp('2017-05-01 12:30:00'),
  Timestamp('2017-05-01 13:00:00'),
  Timedelta('0 days 00:00:00'),
  False],
 [1914,
  Timestamp('2017-05-01 12:30:00'),
  Timestamp('2017-05-01 13:00:00'),
  Timedelta('0 days 00:30:00'),
  True],
 [1940,
  Timestamp('2017-05-01 13:00:00'),
  Timestamp('2017-05-01 16:00:00'),
  Timedelta('0 days 00:00:00'),
  True],
 [1958,
  Timestamp('2017-05-01 14:30:00'),
  Timestamp('2017-05-01 15:30:00'),
  Timedelta('0 days 01:00:00'),
  True],
 [1892,
  Timestamp('2017-05-01 16:00:00'),
  Timestamp('2017-05-01 16:30:00'),
  Timedelta('0 days 00:00:00'),
  False],
 [1929,
  Timestamp('2017-05-01 16:30:00'),
  Timestamp('2017-05-01 17:00:00'),
  Timedelta('0 days 00:00:00'),
  False],
 [1887,
  Timestamp('2017-05-01 17:30:00'),
  Timestamp('2017-05-01 18:00:00'),
  Timedelta('0 days 00:00:00'),
  False]]


Edit new data 09/07/2018:

    Start               End                 Time_removed  Double booked
77  2018-07-02 00:00:00 2018-07-02 10:00:00 00:00:00      True
78  2018-07-02 03:00:00 2018-07-02 08:00:00 05:00:00      True
79  2018-07-02 03:00:00 2018-07-02 08:00:00 05:00:00      True
80  2018-07-02 04:30:00 2018-07-02 09:30:00 03:30:00      True
81  2018-07-02 05:00:00 2018-07-02 10:00:00 04:30:00      True
82  2018-07-02 05:00:00 2018-07-02 10:00:00 05:00:00      True


Row 80 should remove 5 hours but only removes 3:30 because it compares to the one row before it. It must have previously computed Time_removed between row 77 and 80 but then it gets replaced by a smaller timediff.

Is there a faster way to iterate through a DataFrame?

Answers (1)

Edit

Edit 2

Related Questions