Brian
Brian

Reputation: 45

R: Findest nearest smaller value/date from another dataframe

I have two data frames:

Ratings:

CUSIP   Date       Rating
BAEU    01.01.2014  A+
BAEU    30.01.2015  A
BAEU    28.02.2017  BB
BAEU    28.03.2018  BB
CUOD    01.03.2010  BBB
CUOD    02.03.2012  BB
CUOD    03.03.2016  AA
CUOD    04.03.2018  C
BBAE    20.06.2009  A
BBAE    21.06.2012  A+
BBAE    22.11.2015  B-
BBAE    23.06.2016  BBB

Output:

Date        CUSIP
01.05.2014  BAEU
01.01.2015  BAEU
01.02.2015  BAEU
01.01.2017  BAEU
02.01.2017  BAEU
15.03.2018  BAEU
01.05.2010  CUOD
02.08.2012  CUOD
01.01.2016  CUOD
04.05.2018  CUOD
20.06.2010  BBAE
21.01.2012  BBAE
23.11.2015  BBAE
01.01.2016  BBAE
23.06.2016  BBAE

I'd like to add a column to the data frame "Output" that is called "Rating". This column comes from data frame "Ratings" and takes the rating, based on the CUSIP and the rating that is valid on that respective date.

The result should look as following:

Output_II (column Rating to be generated):

Date        CUSIP   Rating (to be generated, based on data frame Rating)
01.05.2014  BAEU    A+
01.01.2015  BAEU    A+
01.02.2015  BAEU    A
01.01.2017  BAEU    A
02.01.2017  BAEU    A
15.03.2018  BAEU    BB
01.05.2010  CUOD    BBB
02.08.2012  CUOD    BB
01.01.2016  CUOD    BB
04.05.2018  CUOD    C
20.06.2010  BBAE    A
21.01.2012  BBAE    A
23.11.2015  BBAE    B-
01.01.2016  BBAE    B-
23.06.2016  BBAE    BBB

I have already tried with dplyr and zoo. It looked something like this:

library(dplyr)
library(zoo)
Output_II = Output %>% 
  group_by(cusip, date) %>% 
  mutate(...)

However, I did not find a way to finishing the code.

Upvotes: 1

Views: 76

Answers (1)

Lennyy
Lennyy

Reputation: 6132

You could fully join the dataframes, arrange on CUSIP and Date, and use the na.locf() function from zoo to carry the last Rating observation forward to fill the NAs. Since you've arranged on CUSIP and Date, the missing Ratings will be replaced by the appropriate Rating observations. At last, you need to filter the resulting dataframe so that it only contains the rows from the original Output dataframe. A safe method for that last step could be a right_join with the original Output, which also assures Output is arranged in the same order as originally.

library(dplyr)
library(zoo)

Output %>% 
  full_join(Ratings) %>% 
  arrange(CUSIP, Date) %>% 
  mutate(Rating = na.locf(Rating)) %>% 
  right_join(Output)

         Date CUSIP Rating
1  2014-05-01  BAEU     A+
2  2015-01-01  BAEU     A+
3  2015-02-01  BAEU      A
4  2017-01-01  BAEU      A
5  2017-01-02  BAEU      A
6  2018-03-15  BAEU     BB
7  2010-05-01  CUOD    BBB
8  2012-08-02  CUOD     BB
9  2016-01-01  CUOD     BB
10 2018-05-04  CUOD      C
11 2010-06-20  BBAE      A
12 2012-01-21  BBAE      A
13 2015-11-23  BBAE     B-
14 2016-01-01  BBAE     B-
15 2016-06-23  BBAE    BBB

DATA:

Ratings <- read.table(text = "CUSIP   Date       Rating
                      BAEU    01.01.2014  A+
                        BAEU    30.01.2015  A
                      BAEU    28.02.2017  BB
                      BAEU    28.03.2018  BB
                      CUOD    01.03.2010  BBB
                      CUOD    02.03.2012  BB
                      CUOD    03.03.2016  AA
                      CUOD    04.03.2018  C
                      BBAE    20.06.2009  A
                      BBAE    21.06.2012  A+
                        BBAE    22.11.2015  B-
                        BBAE    23.06.2016  BBB", h = T )

Output <- read.table(text = "Date        CUSIP
01.05.2014  BAEU
                     01.01.2015  BAEU
                     01.02.2015  BAEU
                     01.01.2017  BAEU
                     02.01.2017  BAEU
                     15.03.2018  BAEU
                     01.05.2010  CUOD
                     02.08.2012  CUOD
                     01.01.2016  CUOD
                     04.05.2018  CUOD
                     20.06.2010  BBAE
                     21.01.2012  BBAE
                     23.11.2015  BBAE
                     01.01.2016  BBAE
                     23.06.2016  BBAE", h = T)

Ratings$Date <- as.Date(Ratings$Date, "%d.%m.%Y")
Output$Date <- as.Date(Output$Date, "%d.%m.%Y")

Based on comments, below code could be safer in case not all CUSIPS in Output have a Rating in Ratings from before the first CUSIP date in Output. Also, the data I used is slightly modified to show what happens:

Ratings <- read.table(text = "CUSIP   Date       Rating
                      BAEU    01.01.2014  A+
                        BAEU    30.01.2015  A
                      BAEU    28.02.2017  BB
                      BAEU    28.03.2018  BB
                      CUOD    01.03.2010  BBB
                      CUOD    02.03.2012  BB
                      CUOD    03.03.2016  AA
                      CUOD    04.03.2018  C
                      BBAE    20.06.2009  A
                      BBAE    21.06.2012  A+
                        BBAE    22.11.2015  B-
                        BBAE    23.06.2016  BBB
                      TEST 01.01.2018 AAA", h = T )

Output <- read.table(text = "Date        CUSIP
01.05.2014  BAEU
                     01.01.2015  BAEU
                     01.02.2015  BAEU
                     01.01.2017  BAEU
                     02.01.2017  BAEU
                     15.03.2018  BAEU
                     01.05.2010  CUOD
                     02.08.2012  CUOD
                     01.01.2016  CUOD
                     04.05.2018  CUOD
                     20.06.2010  BBAE
                     21.01.2012  BBAE
                     23.11.2015  BBAE
                     01.01.2016  BBAE
                     23.06.2016  BBAE
                     01.01.2017 TEST
                     01.01.2019 TEST", h = T)

Ratings$Date <- as.Date(Ratings$Date, "%d.%m.%Y")
Output$Date <- as.Date(Output$Date, "%d.%m.%Y")
library(dplyr)
library(zoo)


Output %>% 
  full_join(Ratings) %>% 
  arrange(CUSIP, Date) %>% 
  group_by(CUSIP) %>% 
  mutate(Rating = na.locf(Rating, na.rm = F)) %>% 
  right_join(Output)


         Date  CUSIP Rating
       <date> <fctr> <fctr>
 1 2014-05-01   BAEU     A+
 2 2015-01-01   BAEU     A+
 3 2015-02-01   BAEU      A
 4 2017-01-01   BAEU      A
 5 2017-01-02   BAEU      A
 6 2018-03-15   BAEU     BB
 7 2010-05-01   CUOD    BBB
 8 2012-08-02   CUOD     BB
 9 2016-01-01   CUOD     BB
10 2018-05-04   CUOD      C
11 2010-06-20   BBAE      A
12 2012-01-21   BBAE      A
13 2015-11-23   BBAE     B-
14 2016-01-01   BBAE     B-
15 2016-06-23   BBAE    BBB
16 2017-01-01   TEST   <NA>
17 2019-01-01   TEST    AAA

Upvotes: 2

Related Questions