Lukas
Lukas

Reputation: 63

Replace Values with Nan based on Date and matching value in a different data frame

I have two data frames. One (customers) of which contains all customers with "ID" and "Date Sign up".

id                      date_joined
123 2019-09-12 17:05:27.916301+00:00
1234 2019-12-29 14:39:34.024478+00:00
1235 2020-03-09 12:47:52.284569+00:00
1236 2020-03-19 10:07:41.222140+00:00
1237 2020-03-19 06:34:25.896585+00:00

In a second data frame (turnover) I have the customer ID and year_month and turnover_sum. Important to point out: For every customer ID there are several rows in this dataframe as it tracks the monthyl turnoverover over a longe period.

ID  year_month       monthly_turnover
264  2019-Oct                 0
264  2019-Nov                 200
264  2019-Dec                 150
264  2020-Jan                 30
265  2020-Oct                 14

What I would like to do for every customer is to replace all values in the monthly_turnover column with Nan if the value in the year_month column is earlier than the "Date Sign up" of the respective user.

How to achieve this? I don't know where to begin.

Upvotes: 2

Views: 599

Answers (2)

Lukas
Lukas

Reputation: 63

I found a solution to my problem. I have merged the two dataframes and then used np.where condition to compare the value of the year_month column and the date_joined column.

merged.pf["monthly_turnover"] = np.where(
merged.pf["year_month"] >= merged.pf["date_joined"],
merged.pf["monthly_turnover"], np.nan)

Thanks for the input anyways

Upvotes: 0

BLimitless
BLimitless

Reputation: 2575

You can do this in one (admittedly dense) line:

turnover.loc[turnover['year_month'] < customers['date'], 'monthly_turnover'] = np.nan

There's a lot packed into that single line, so let me unpack item by item:

turnover['year_month'] < customers['date']

Pandas knows about dates, so assuming your indices are matched pandas can automatically check if one date is after another. That clause will return a series of True and Falses.

We then take that series into the .loc function. which will match each row where the boolean is True, and then set 'monthly_turnover' to NaN.

You need to import numpy as np for np.nan to work.

This gave me the following output:

enter image description here

Let us know if you need more, and hope this helps!

Upvotes: 1

Related Questions