Reputation: 485
I have a dataframe with following data:
id|invoice_no|invoice_date|change_previous_month|change_status
984974|110|2016-12-31|0|A
984974|8202|2017-01-30|-64864|D
115677|5505|2016-12-31|0|A
115677|5635|2017-01-30|58730|U
i want to create a new column 'event_date' from 'invoice_data' based on conditions on existing columns. the conditions are:
1) select maximum 'invoice_date' that is defined as the latest date and
2) select 'change_status' where 'change_status' == 'A' or 'change_status' == 'U'
The resulted dataframe should look like this:
id|invoice_no|invoice_date|change_previous_month|change_status|event_date
984974|110|2016-12-31|0|A|2016-12-31
984974|8202|2017-01-30|-64864|D|2016-12-31
115677|5505|2016-11-31|0|A|2017-01-30
115677|5635|2017-01-30|58730|U|2017-01-30
the event_date should be created from the invoice_date with satisfying the above two conditions ,thanks in advance for your help.
Upvotes: 1
Views: 542
Reputation: 862801
I think you need:
isin
with boolean indexing
sort_values
by column invoice_date
drop_duplicates
for last row per id
set_index
for create Series
map
by id
to new columns = (df[df['change_status'].isin(['A','U'])]
.sort_values('invoice_date')
.drop_duplicates('id', keep='last')
.set_index('id')['invoice_date'])
df['event_date'] = df['id'].map(s)
print (df)
id invoice_no invoice_date change_previous_month change_status \
0 984974 110 2016-12-31 0 A
1 984974 8202 2017-01-30 -64864 D
2 115677 5505 2016-12-31 0 A
3 115677 5635 2017-01-30 58730 U
event_date
0 2016-12-31
1 2016-12-31
2 2017-01-30
3 2017-01-30
Upvotes: 2