Reputation: 63
I have two data frames. One (customers) of which contains all customers with "ID" and "Date Sign up".
id date_joined
123 2019-09-12 17:05:27.916301+00:00
1234 2019-12-29 14:39:34.024478+00:00
1235 2020-03-09 12:47:52.284569+00:00
1236 2020-03-19 10:07:41.222140+00:00
1237 2020-03-19 06:34:25.896585+00:00
In a second data frame (turnover) I have the customer ID and year_month and turnover_sum. Important to point out: For every customer ID there are several rows in this dataframe as it tracks the monthyl turnoverover over a longe period.
ID year_month monthly_turnover
264 2019-Oct 0
264 2019-Nov 200
264 2019-Dec 150
264 2020-Jan 30
265 2020-Oct 14
What I would like to do for every customer is to replace all values in the monthly_turnover column with Nan if the value in the year_month column is earlier than the "Date Sign up" of the respective user.
How to achieve this? I don't know where to begin.
Upvotes: 2
Views: 599
Reputation: 63
I found a solution to my problem. I have merged the two dataframes and then used np.where condition to compare the value of the year_month column and the date_joined column.
merged.pf["monthly_turnover"] = np.where(
merged.pf["year_month"] >= merged.pf["date_joined"],
merged.pf["monthly_turnover"], np.nan)
Thanks for the input anyways
Upvotes: 0
Reputation: 2575
You can do this in one (admittedly dense) line:
turnover.loc[turnover['year_month'] < customers['date'], 'monthly_turnover'] = np.nan
There's a lot packed into that single line, so let me unpack item by item:
turnover['year_month'] < customers['date']
Pandas knows about dates, so assuming your indices are matched pandas can automatically check if one date is after another. That clause will return a series of True and Falses.
We then take that series into the .loc function. which will match each row where the boolean is True, and then set 'monthly_turnover' to NaN.
You need to import numpy as np for np.nan to work.
This gave me the following output:
Let us know if you need more, and hope this helps!
Upvotes: 1