Reputation: 47
CUSTOMER LOAN DATE LOAN_DEFAULT CUSTOMER_DEFAULT
1 101 201601 Y Y
1 102 201603 N Y
1 101 201501 Y Y
2 201 201601 N N
2 202 201603 N N
I have a dataset like above, there are multiple customers and each CUSTOMER may have several loans. A CUSTOMER with at least 1 LOAN_DEFAULT will be marked as CUSTOMER_DEFAULT.
I want to create a new column "DEFAULT_DATE" based on DATE and LOAN_DEFAULT at CUSTOMER Level, and the date is the earliest default date, like below:
CUSTOMER LOAN DATE LOAN_DEFAULT CUSTOMER_DEFAULT DEFAULT_DATE
1 101 201601 Y Y 201501
1 102 201603 N Y 201501
1 103 201501 Y Y 201501
2 201 201601 N N -
2 202 201603 N N -
How can I do this?
Upvotes: 1
Views: 35
Reputation: 887731
Here is an option with data.table
library(data.table)
setDT(df)[, Date := as.IDate(paste0(DATE, '01'), '%Y%m%d'))
][order(CUSTOMER, DATE), DEFAULT_DATE :=
DATE[match('Y', LOAN_DEFAULT)] , CUSTOMER][]
Upvotes: 0
Reputation: 389225
Assuming the DATE
column is of format year-month , we can add an arbitrary date to convert DATE
in Date class, arrange
the data by CUSTOMER
and DATE
and get the data of first LOAN_DEFAULT
for each CUSTOMER
.
library(dplyr)
df %>%
mutate(DATE = as.Date(paste0(DATE, '01'), '%Y%m%d')) %>%
arrange(CUSTOMER, DATE) %>%
group_by(CUSTOMER) %>%
mutate(DEFAULT_DATE = DATE[which(LOAN_DEFAULT == 'Y')[1]])
# CUSTOMER LOAN DATE LOAN_DEFAULT CUSTOMER_DEFAULT DEFAULT_DATE
# <int> <int> <date> <fct> <fct> <date>
#1 1 101 2015-01-01 Y Y 2015-01-01
#2 1 101 2016-01-01 Y Y 2015-01-01
#3 1 102 2016-03-01 N Y 2015-01-01
#4 2 201 2016-01-01 N N NA
#5 2 202 2016-03-01 N N NA
Upvotes: 1