Calculating the daily sum in pandas dataframe

Question

I have a dataframe which looks like below and I am trying to calculate the daily dose for each drug. In the below table I have one drug but in the actual table I have multiple drugs.

Let me give a brief explanation of the table and the scenario. The ID column is the unique patient id, The start, and end column says when the medication has started and ended for that rate, the hours' column says # of hours between start and end for that ID, the rate column says at what rate the medication was given during that period of time for that ID. The way the table is designed is there is a start and end date for each rate for every ID.

In a hospital setting, if they want to calculate the total daily dose, they calculate from 7 AM today to 7 AM the next day rather than doing it from midnight to midnight.

  ID           START                   END            DRUG     RATE  HOURS
15114299  2011-03-18 18:23:00  2011-03-19 20:59:59  morphine   0.03   26.6
15114299  2011-03-19 21:00:00  2011-03-20 01:29:59  morphine    0.1   4.4
15340823  2011-02-18 16:00:00  2011-02-19 03:09:59  morphine    0.1   11.16
15340823  2011-02-19 15:30:00  2011-02-19 21:59:59  morphine    0.05  6.49
15621352  2011-04-20 03:44:00  2011-04-20 13:41:59  morphine    0.05  9.96
15621352  2011-04-20 13:42:00  2011-04-20 23:59:59  morphine    0.5   10.29
15775791  2011-07-21 19:00:00  2011-07-24 03:59:59  morphine    0.1   56.99
15775791  2011-07-24 04:00:00  2011-07-24 14:14:59  morphine    0.05  10.24
15947559  2011-07-05 19:40:00  2011-07-06 05:43:59  morphine    0.1   10.06
15947559  2011-07-06 05:44:00  2011-07-09 01:59:59  morphine    0.15  68.26
15947559  2011-07-09 02:00:00  2011-07-09 18:59:59  morphine    0.1   16.99
15947559  2011-07-14 19:30:00  2011-07-15 18:29:59  morphine    0.1   22.99
15947559  2011-07-15 18:30:00  2011-07-17 02:59:59  morphine    0.15  32.49
15947559  2011-07-17 03:00:00  2011-07-17 08:59:59  morphine    0.1   5.99
15947559  2011-07-17 09:00:00  2011-07-17 16:59:59  morphine    0.075 7.99

As we can see in the example dataframe, the first medication was for 26.6 hours and what I want to do is split that record into two which looks like below and having a day column would be useful to group by and get the sum for each day in the end.

Expected DF:

  ID           START                   END            DRUG     RATE     HOURS   Days
15114299  2011-03-18 18:23:00  2011-03-19 06:59:59  morphine   0.03     12.62   Day1
15114299  2011-03-19 07:00:00  2011-03-19 20:59:59  morphine   0.03     14      Day2
15114299  2011-03-19 21:00:00  2011-03-20 01:29:59  morphine    0.1     4.5     Day2

15340823  2011-02-18 16:00:00  2011-02-19 03:09:59  morphine    0.1     11.16   Day1
15340823  2011-02-19 15:30:00  2011-02-19 21:59:59  morphine    0.05    6.49    Day2

15621352  2011-04-20 03:44:00  2011-04-20 06:59:59  morphine    0.05    3.27    Day1
15621352  2011-04-20 07:00:00  2011-04-20 13:41:59  morphine    0.05    6.70    Day2
15621352  2011-04-20 13:42:00  2011-04-20 23:59:59  morphine    0.5     10.29   Day2

15775791  2011-07-21 19:00:00  2011-07-22 06:59:59  morphine    0.1     12      Day1
15775791  2011-07-22 07:00:00  2011-07-23 06:59:59  morphine    0.1     24      Day2        
15775791  2011-07-23 07:00:00  2011-07-24 03:59:59  morphine    0.1     21      Day3
15775791  2011-07-24 04:00:00  2011-07-24 06:59:59  morphine    0.05    3       Day3
15775791  2011-07-24 07:00:00  2011-07-24 14:14:59  morphine    0.05    7.25    Day4

15947559  2011-07-05 19:40:00  2011-07-06 05:43:59  morphine    0.1     10.06   Day1
15947559  2011-07-06 05:44:00  2011-07-06 06:59:59  morphine    0.15    1.27    Day1 
15947559  2011-07-06 07:00:00  2011-07-07 06:59:59  morphine    0.15    24      Day2
15947559  2011-07-07 07:00:00  2011-07-08 06:59:59  morphine    0.15    24      Day3
15947559  2011-07-08 07:00:00  2011-07-09 01:59:59  morphine    0.15    19      Day4
15947559  2011-07-09 02:00:00  2011-07-09 06:59:59  morphine    0.1     5       Day4
15947559  2011-07-09 07:00:00  2011-07-09 18:59:59  morphine    0.1     12      Day5
15947559  2011-07-14 19:30:00  2011-07-15 06:59:59  morphine    0.1     11.50   Day6
15947559  2011-07-15 07:00:00  2011-07-15 18:29:59  morphine    0.1     11.50   Day7
15947559  2011-07-15 18:30:00  2011-07-16 06:59:59  morphine    0.15    12.50   Day7
15947559  2011-07-16 07:00:00  2011-07-17 02:59:59  morphine    0.15    20      Day8
15947559  2011-07-17 03:00:00  2011-07-17 06:59:59  morphine    0.1     4       Day8
15947559  2011-07-17 07:00:00  2011-07-17 08:59:59  morphine    0.1     2       Day9
15947559  2011-07-17 09:00:00  2011-07-17 16:59:59  morphine    0.075   8       Day9

(I created the spaces between two ID's just to be more clear and understanding)

I tried the following code.

test = pd.concat([pd.DataFrame({'START': pd.date_range(row['START'], row['END'], freq='D'),
               'PERSON_ID': row['ID'],
               'DRUG': row['DRUG'], 
               'RATE': row['RATE']}, columns=['ID', 'START', 'DRUG', 'RATE'])
           for i, row in df.iterrows()], ignore_index=False)

but this creates a dataframe in the following way.

ID              START          DRUG       RATE
15114299    2011-03-18 18:23:00 morphine    0.03
15114299    2011-03-19 18:23:00 morphine    0.03
15114299    2011-03-19 21:00:00 morphine    0.1
15340823    2011-02-18 16:00:00 morphine    0.1
15340823    2011-02-19 15:30:00 morphine    0.05
15621352    2011-04-20 03:44:00 morphine    0.05
15621352    2011-04-20 13:42:00 morphine    0.5
15775791    2011-07-21 19:00:00 morphine    0.1
15775791    2011-07-22 19:00:00 morphine    0.1
15775791    2011-07-23 19:00:00 morphine    0.1
15775791    2011-07-24 04:00:00 morphine    0.05
15947559    2011-07-05 19:40:00 morphine    0.1
15947559    2011-07-06 05:44:00 morphine    0.15
15947559    2011-07-07 05:44:00 morphine    0.15
15947559    2011-07-08 05:44:00 morphine    0.15
15947559    2011-07-09 02:00:00 morphine    0.1
15947559    2011-07-14 19:30:00 morphine    0.1
15947559    2011-07-15 18:30:00 morphine    0.15
15947559    2011-07-16 18:30:00 morphine    0.15
15947559    2011-07-17 03:00:00 morphine    0.1
15947559    2011-07-17 09:00:00 morphine    0.075

I'd really appreciate if someone can help me on how to approach this problem. Happy to provide additional details.

Calculating the daily sum in pandas dataframe

Answers (1)

Related Questions