Reputation: 145
I have the following dataframe:
account_id contract_id date_activated 2021-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00 2021-04-01 00:00:00 2021-05-01 00:00:00 2021-06-01 00:00:00
0 1 A 2020-12-04 200.0 200.0 200.0 0.0 0.0 0.0 0.0
1 1 B 2021-03-09 0.0 0.0 300.0 300.0 300.0 300.0 300.0
2 1 C 2021-04-25 0.0 0.0 0.0 0.0 100.0 100.0 100.0
I want to change the date_activated column when it does not equal with the monthly payments data shown from the 4th column and so on (in terms of the month and year). The revised date_activated will be the same as the column header, i.e. it will be changed to the earliest payment date and can just contain the date representing the first day of the month). I need to retain only those that are the same with the month and the year.
I want an output as this:
account_id contract_id date_activated 2021-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00 2021-04-01 00:00:00 2021-05-01 00:00:00 2021-06-01 00:00:00
0 1 A 2021-12-01 200.0 200.0 200.0 0.0 0.0 0.0 0.0
1 1 B 2021-02-01 0.0 0.0 300.0 300.0 300.0 300.0 300.0
2 1 C 2021-04-25 0.0 0.0 0.0 0.0 100.0 100.0 100.0
Here's the dictionary for the dataframe:
{'account_id': {0: 1, 1: 1, 2: 1},
'contract_id': {0: 'A', 1: 'B', 2: 'C'},
'date_activated': {0: Timestamp('2020-12-04 00:00:00'),
1: Timestamp('2021-03-09 00:00:00'),
2: Timestamp('2021-04-25 00:00:00')},
datetime.datetime(2021, 12, 1, 0, 0): {0: 200.0, 1: 0.0, 2: 0.0},
datetime.datetime(2021, 1, 1, 0, 0): {0: 200.0, 1: 0.0, 2: 0.0},
datetime.datetime(2021, 2, 1, 0, 0): {0: 200.0, 1: 300.0, 2: 0.0},
datetime.datetime(2021, 3, 1, 0, 0): {0: 0.0, 1: 300.0, 2: 0.0},
datetime.datetime(2021, 4, 1, 0, 0): {0: 0.0, 1: 300.0, 2: 100.0},
datetime.datetime(2021, 5, 1, 0, 0): {0: 0.0, 1: 300.0, 2: 100.0},
datetime.datetime(2021, 6, 1, 0, 0): {0: 0.0, 1: 300.0, 2: 100.0}}
Here's the dictionary for the desired output:
{'account_id': {0: 1, 1: 1, 2: 1},
'contract_id': {0: 'A', 1: 'B', 2: 'C'},
'date_activated': {0: Timestamp('2021-12-01 00:00:00'),
1: Timestamp('2021-02-01 00:00:00'),
2: Timestamp('2021-04-25 00:00:00')},
datetime.datetime(2021, 12, 1, 0, 0): {0: 200.0, 1: 0.0, 2: 0.0},
datetime.datetime(2021, 1, 1, 0, 0): {0: 200.0, 1: 0.0, 2: 0.0},
datetime.datetime(2021, 2, 1, 0, 0): {0: 200.0, 1: 300.0, 2: 0.0},
datetime.datetime(2021, 3, 1, 0, 0): {0: 0.0, 1: 300.0, 2: 0.0},
datetime.datetime(2021, 4, 1, 0, 0): {0: 0.0, 1: 300.0, 2: 100.0},
datetime.datetime(2021, 5, 1, 0, 0): {0: 0.0, 1: 300.0, 2: 100.0},
datetime.datetime(2021, 6, 1, 0, 0): {0: 0.0, 1: 300.0, 2: 100.0}}
Upvotes: 1
Views: 63
Reputation: 29982
You can use pandas.Series.first_valid_index() to check the first index of not none value after replace 0 to np.nan
in your datetime column.
Then use pandas.DataFrame.where() to replace values where the condition is False
.
idx = df.iloc[:, 3:].replace(0, np.nan).T.apply(pd.Series.first_valid_index)
m = (df['date_activated'].dt.year == idx.dt.year) & (df['date_activated'].dt.month == idx.dt.month)
df['date_activated'] = df['date_activated'].where(m, idx)
print(df['date_activated'])
0 2021-12-01
1 2021-02-01
2 2021-04-25
Name: date_activated, dtype: datetime64[ns]
Upvotes: 2