Damar Buana Murti
Damar Buana Murti

Reputation: 33

Python Pandas Add Value to Multiple Rows Based on Date

I need to solve a data science problem using only Python and Pandas where the given input is the date of check in, the date of check out, and the payments of every books from a single person

check_in   check_out  payment
2020-02-28 2020-03-02 66
2020-02-27 2020-02-29 100

I need to show how much daily earnings I got from that 2 books earlier. I got the idea that I have to split it for each date like 66/3 and distribute it from 02-28 to 03-02 and 100/2 then distribute it from 02-27 to 02-29 for the second order, then I will get result like this

date       earnings
2020-02-27 50
2020-02-28 22+50
2020-02-29 22
2020-03-01 22

Which will be like this if I summed them up

date       earnings
2020-02-27 50
2020-02-28 72
2020-02-29 22
2020-03-01 22

I have already divided the payment column from input and create a new dataframe with two new columns which represent how long did the guests stay using day_stay and how much earnings did I receive with daily_earn using the code below

df["date_stay"] = abs(df["check_in"] - df["check_out"]) # Get difference
df["date_stay"] = pd.to_numeric(df["date_stay"].dt.days) # Turn to days
df["daily_earn"] = df["payment"]/df["date_stay"]

Then the result will be look like this

check_in   check_out  payment day_stay daily_earn
2020-02-28 2020-03-02 66      3        22
2020-02-27 2020-02-29 100     2        50

The last step is to distribute the value in daily_earn to the corresponding date from 02-27 to 03-01 but I don't have any idea to do that. I have tried to use df.groupby("check_in").sum() but it didn't give the desired output since the number of rows between the input and output is different. Any ideas?

Upvotes: 2

Views: 433

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

Let's try a different approach:

df['date'] = df.apply(
    lambda r: pd.date_range(r['check_in'], r['check_out'], closed='left'),
    axis=1)

df = df.explode('date')

df['earnings'] = df['payment'] / df.groupby(level=0)['date'].transform('count')

df = df.groupby('date', as_index=False)['earnings'].agg('sum')

df:

        date  earnings
0 2020-02-27      50.0
1 2020-02-28      72.0
2 2020-02-29      22.0
3 2020-03-01      22.0

Breakdown of steps:

apply pd.date_range to each row to get the days between start and end:

df['date'] = df.apply(
    lambda r: pd.date_range(r['check_in'], r['check_out'], closed='left'),
    axis=1)
    check_in  check_out  payment                                                                                         date
0 2020-02-28 2020-03-02       66  DatetimeIndex(['2020-02-28', '2020-02-29', '2020-03-01'], dtype='datetime64[ns]', freq='D')
1 2020-02-27 2020-02-29      100                DatetimeIndex(['2020-02-27', '2020-02-28'], dtype='datetime64[ns]', freq='D')

Then explode the date into rows:

df = df.explode('date')
    check_in  check_out  payment       date
0 2020-02-28 2020-03-02       66 2020-02-28
0 2020-02-28 2020-03-02       66 2020-02-29
0 2020-02-28 2020-03-02       66 2020-03-01
1 2020-02-27 2020-02-29      100 2020-02-27
1 2020-02-27 2020-02-29      100 2020-02-28

Then groupby transform count date to get the number of dates and divide the payment by the number of days to get the daily earnings:

df['earnings'] = df['payment'] / df.groupby(level=0)['date'].transform('count')
    check_in  check_out  payment       date  earnings
0 2020-02-28 2020-03-02       66 2020-02-28      22.0
0 2020-02-28 2020-03-02       66 2020-02-29      22.0
0 2020-02-28 2020-03-02       66 2020-03-01      22.0
1 2020-02-27 2020-02-29      100 2020-02-27      50.0
1 2020-02-27 2020-02-29      100 2020-02-28      50.0

Then groupby agg sum the earnings on the date to get the total per date:

df = df.groupby('date', as_index=False)['earnings'].agg('sum')
        date  earnings
0 2020-02-27      50.0
1 2020-02-28      72.0
2 2020-02-29      22.0
3 2020-03-01      22.0

DataFrame and imports used:

import pandas as pd

df = pd.DataFrame({'check_in': {0: '2020-02-28', 1: '2020-02-27'},
                   'check_out': {0: '2020-03-02', 1: '2020-02-29'},
                   'payment': {0: 66, 1: 100}})
df['check_in'] = pd.to_datetime(df['check_in'])
df['check_out'] = pd.to_datetime(df['check_out'])

Upvotes: 3

Related Questions