Reputation: 1211
I have experience using the inner_join function in R on dataframes in which the column values matches between two dataframes. However, I have one dataframe that has the average stock prices for each month for every stock from 2007-2014, and another dataframe that has each stock's financial ratios from 2007-2014 and shows the month of the fiscal year end for each company. The problem is, a company's financial ratios aren't reported until they release their 10K which is 3 months later. Thus, I want to match each company's financial ratios with their appropriate stock price which is 3 months afterwards.
RatioDF:
Symbol Month Year 10KRatio1 10KRatio2 ... 10KRatioN
FLWS 6 2007 100 200 ... 1000
ACAD 12 2007 500 600 ... 2000
StockPriceDF:
Company Year Month MeanPrice
FLWS 2007 1 6.32
. . . .
. . . .
. . . .
FLWS 2007 9 10.995
. . . .
. . . .
. . . .
FLWS 2014 12 17.92
. . . .
ACAD 2007 1 7.5
. . . .
. . . .
. . . .
ACAD 2008 3 8.64
. . . .
. . . .
DesiredDF:
Symbol Month Year 10KRatio1 10KRatio2 ... 10KRatioN MeanPrice
FLWS 9 2007 100 200 1000 10.995
ACAD 3 2008 500 600 2000 8.64
I was thinking of using a for loop to check if the RatioDF months is from 10-12 and then just match it with the months 1-3 of next year for the appropriate Symbol/Company, but I think computation might take too long since there are a lot of stocks for these years and a lot of monthly prices too.
Upvotes: 1
Views: 47
Reputation: 83255
A possible solution with lubridate
and either data.table
or dplyr
.
1) with data.table:
# load packages
library(lubridate)
library(data.table)
# convert both dataframes to data.table's and add a 'date'-variable
setDT(d1)[, date := as.IDate(sprintf('%s-%02d-01',Year,Month))][]
# idem + substract 3 months with lubridate's '%m-%` function
setDT(d2)[, date := as.IDate(sprintf('%s-%02d-01',Year,Month)) %m-% months(3)][]
# join d1 with d2 and update d1 by reference
d1[d2, on = .(Symbol = Company, date), MeanPrice := MeanPrice][]
which gives:
Symbol Month Year 10KRatio1 10KRatio2 date MeanPrice 1: FLWS 6 2007 100 200 2007-06-01 10.995 2: ACAD 12 2007 500 600 2007-12-01 8.640
An alternative join-method could be:
d1[d2[, .(Company, date, MeanPrice)], on = .(Symbol = Company, date), nomatch = 0L][]
2) with dplyr:
# load packages
library(lubridate)
library(dplyr)
# add a 'date'-variable to 'd1'
# add a 'date'-variable to 'd2' and substract 3 months
# from that with lubridate's '%m-%` function
# select only 'Company', 'date' and 'MeanPrice' from 'd2'
# join 'd1' with 'd2'
d1 %>%
mutate(date = as.Date(sprintf('%s-%02d-01',Year,Month))) %>%
left_join(., d2 %>%
mutate(date = as.Date(sprintf('%s-%02d-01',Year,Month)) %m-% months(3)) %>%
select(Company, date, MeanPrice),
by = c('Symbol' = 'Company', 'date'))
which gives the same result:
Symbol Month Year 10KRatio1 10KRatio2 date MeanPrice 1 FLWS 6 2007 100 200 2007-06-01 10.995 2 ACAD 12 2007 500 600 2007-12-01 8.640
Used data:
d1 <- structure(list(Symbol = c("FLWS", "ACAD"),
Month = c(6L, 12L),
Year = c(2007L, 2007L),
`10KRatio1` = c(100L, 500L),
`10KRatio2` = c(200L, 600L)),
.Names = c("Symbol", "Month", "Year", "10KRatio1", "10KRatio2"), class = "data.frame", row.names = c(NA, -2L))
d2 <- structure(list(Company = c("FLWS", "FLWS", "FLWS", "ACAD", "ACAD"),
Year = c(2007L, 2007L, 2014L, 2007L, 2008L),
Month = c(1L, 9L, 12L, 1L, 3L),
MeanPrice = c(6.32, 10.995, 17.92, 7.5, 8.64)),
.Names = c("Company", "Year", "Month", "MeanPrice"), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 2