Reputation: 45
I have two data frames:
Ratings:
CUSIP Date Rating
BAEU 01.01.2014 A+
BAEU 30.01.2015 A
BAEU 28.02.2017 BB
BAEU 28.03.2018 BB
CUOD 01.03.2010 BBB
CUOD 02.03.2012 BB
CUOD 03.03.2016 AA
CUOD 04.03.2018 C
BBAE 20.06.2009 A
BBAE 21.06.2012 A+
BBAE 22.11.2015 B-
BBAE 23.06.2016 BBB
Output:
Date CUSIP
01.05.2014 BAEU
01.01.2015 BAEU
01.02.2015 BAEU
01.01.2017 BAEU
02.01.2017 BAEU
15.03.2018 BAEU
01.05.2010 CUOD
02.08.2012 CUOD
01.01.2016 CUOD
04.05.2018 CUOD
20.06.2010 BBAE
21.01.2012 BBAE
23.11.2015 BBAE
01.01.2016 BBAE
23.06.2016 BBAE
I'd like to add a column to the data frame "Output" that is called "Rating". This column comes from data frame "Ratings" and takes the rating, based on the CUSIP and the rating that is valid on that respective date.
The result should look as following:
Output_II (column Rating to be generated):
Date CUSIP Rating (to be generated, based on data frame Rating)
01.05.2014 BAEU A+
01.01.2015 BAEU A+
01.02.2015 BAEU A
01.01.2017 BAEU A
02.01.2017 BAEU A
15.03.2018 BAEU BB
01.05.2010 CUOD BBB
02.08.2012 CUOD BB
01.01.2016 CUOD BB
04.05.2018 CUOD C
20.06.2010 BBAE A
21.01.2012 BBAE A
23.11.2015 BBAE B-
01.01.2016 BBAE B-
23.06.2016 BBAE BBB
I have already tried with dplyr and zoo. It looked something like this:
library(dplyr)
library(zoo)
Output_II = Output %>%
group_by(cusip, date) %>%
mutate(...)
However, I did not find a way to finishing the code.
Upvotes: 1
Views: 76
Reputation: 6132
You could fully join the dataframes, arrange on CUSIP and Date, and use the na.locf() function from zoo to carry the last Rating observation forward to fill the NAs. Since you've arranged on CUSIP and Date, the missing Ratings will be replaced by the appropriate Rating observations. At last, you need to filter the resulting dataframe so that it only contains the rows from the original Output dataframe. A safe method for that last step could be a right_join with the original Output, which also assures Output is arranged in the same order as originally.
library(dplyr)
library(zoo)
Output %>%
full_join(Ratings) %>%
arrange(CUSIP, Date) %>%
mutate(Rating = na.locf(Rating)) %>%
right_join(Output)
Date CUSIP Rating
1 2014-05-01 BAEU A+
2 2015-01-01 BAEU A+
3 2015-02-01 BAEU A
4 2017-01-01 BAEU A
5 2017-01-02 BAEU A
6 2018-03-15 BAEU BB
7 2010-05-01 CUOD BBB
8 2012-08-02 CUOD BB
9 2016-01-01 CUOD BB
10 2018-05-04 CUOD C
11 2010-06-20 BBAE A
12 2012-01-21 BBAE A
13 2015-11-23 BBAE B-
14 2016-01-01 BBAE B-
15 2016-06-23 BBAE BBB
Ratings <- read.table(text = "CUSIP Date Rating
BAEU 01.01.2014 A+
BAEU 30.01.2015 A
BAEU 28.02.2017 BB
BAEU 28.03.2018 BB
CUOD 01.03.2010 BBB
CUOD 02.03.2012 BB
CUOD 03.03.2016 AA
CUOD 04.03.2018 C
BBAE 20.06.2009 A
BBAE 21.06.2012 A+
BBAE 22.11.2015 B-
BBAE 23.06.2016 BBB", h = T )
Output <- read.table(text = "Date CUSIP
01.05.2014 BAEU
01.01.2015 BAEU
01.02.2015 BAEU
01.01.2017 BAEU
02.01.2017 BAEU
15.03.2018 BAEU
01.05.2010 CUOD
02.08.2012 CUOD
01.01.2016 CUOD
04.05.2018 CUOD
20.06.2010 BBAE
21.01.2012 BBAE
23.11.2015 BBAE
01.01.2016 BBAE
23.06.2016 BBAE", h = T)
Ratings$Date <- as.Date(Ratings$Date, "%d.%m.%Y")
Output$Date <- as.Date(Output$Date, "%d.%m.%Y")
Based on comments, below code could be safer in case not all CUSIPS in Output have a Rating in Ratings from before the first CUSIP date in Output. Also, the data I used is slightly modified to show what happens:
Ratings <- read.table(text = "CUSIP Date Rating
BAEU 01.01.2014 A+
BAEU 30.01.2015 A
BAEU 28.02.2017 BB
BAEU 28.03.2018 BB
CUOD 01.03.2010 BBB
CUOD 02.03.2012 BB
CUOD 03.03.2016 AA
CUOD 04.03.2018 C
BBAE 20.06.2009 A
BBAE 21.06.2012 A+
BBAE 22.11.2015 B-
BBAE 23.06.2016 BBB
TEST 01.01.2018 AAA", h = T )
Output <- read.table(text = "Date CUSIP
01.05.2014 BAEU
01.01.2015 BAEU
01.02.2015 BAEU
01.01.2017 BAEU
02.01.2017 BAEU
15.03.2018 BAEU
01.05.2010 CUOD
02.08.2012 CUOD
01.01.2016 CUOD
04.05.2018 CUOD
20.06.2010 BBAE
21.01.2012 BBAE
23.11.2015 BBAE
01.01.2016 BBAE
23.06.2016 BBAE
01.01.2017 TEST
01.01.2019 TEST", h = T)
Ratings$Date <- as.Date(Ratings$Date, "%d.%m.%Y")
Output$Date <- as.Date(Output$Date, "%d.%m.%Y")
library(dplyr)
library(zoo)
Output %>%
full_join(Ratings) %>%
arrange(CUSIP, Date) %>%
group_by(CUSIP) %>%
mutate(Rating = na.locf(Rating, na.rm = F)) %>%
right_join(Output)
Date CUSIP Rating
<date> <fctr> <fctr>
1 2014-05-01 BAEU A+
2 2015-01-01 BAEU A+
3 2015-02-01 BAEU A
4 2017-01-01 BAEU A
5 2017-01-02 BAEU A
6 2018-03-15 BAEU BB
7 2010-05-01 CUOD BBB
8 2012-08-02 CUOD BB
9 2016-01-01 CUOD BB
10 2018-05-04 CUOD C
11 2010-06-20 BBAE A
12 2012-01-21 BBAE A
13 2015-11-23 BBAE B-
14 2016-01-01 BBAE B-
15 2016-06-23 BBAE BBB
16 2017-01-01 TEST <NA>
17 2019-01-01 TEST AAA
Upvotes: 2