Replace Values with Nan based on Date and matching value in a different data frame

Question

I have two data frames. One (customers) of which contains all customers with "ID" and "Date Sign up".

id                      date_joined
123 2019-09-12 17:05:27.916301+00:00
1234 2019-12-29 14:39:34.024478+00:00
1235 2020-03-09 12:47:52.284569+00:00
1236 2020-03-19 10:07:41.222140+00:00
1237 2020-03-19 06:34:25.896585+00:00

In a second data frame (turnover) I have the customer ID and year_month and turnover_sum. Important to point out: For every customer ID there are several rows in this dataframe as it tracks the monthyl turnoverover over a longe period.

ID  year_month       monthly_turnover
264  2019-Oct                 0
264  2019-Nov                 200
264  2019-Dec                 150
264  2020-Jan                 30
265  2020-Oct                 14

What I would like to do for every customer is to replace all values in the monthly_turnover column with Nan if the value in the year_month column is earlier than the "Date Sign up" of the respective user.

How to achieve this? I don't know where to begin.

Lukas · Accepted Answer

I found a solution to my problem. I have merged the two dataframes and then used np.where condition to compare the value of the year_month column and the date_joined column.

merged.pf["monthly_turnover"] = np.where(
merged.pf["year_month"] >= merged.pf["date_joined"],
merged.pf["monthly_turnover"], np.nan)

Thanks for the input anyways

Replace Values with Nan based on Date and matching value in a different data frame

Answers (2)

Related Questions