Reputation: 81
I would like to perform multiple rolling regression for different variables. My data is structured as follows:
Prices:
Date Stock1 Stock2 Stock3 Stocki
1990
...
2000 10000 1200 13000 900
2001 90000 1300 12800 920
2002 98000 1280 11900 932
2003 ...
2004 ...
...
2020
The second dataframe with the variable I want to regress has the same structure:
Expenses:
Date Stock1 Stock2 Stock3 Stocki
1990
...
2000 5 12 8 18
2001 3 13 10 20
2002 2 12 19 32
2003 ...
2004 ...
...
2020
I want to regress each Stock price on a rolling basis with a backwindow of 8 years. The code to calculate for example the coefficient of year 2010 would be:
lm(prices$stock1(2003:2010) ~ expenses$stock1(2002:2009)
Furthermore, my overall aim is to perform for each stock 5 rolling regression with different lags from 1 to 5. Again the code for the coefficient with lag in year 2010 would look as follows:
lag1:
lm(prices$stock1(2003:2010) ~ expenses$stock1(2001:2008)
lag2:
lm(prices$stock1(2003:2010) ~ expenses$stock1(2000:2007)
and so on...
Lastly, I want to calculate the simple average of the coefficients of the five regression performed for each stock in each year with the different lags.
My desired output would look as follows for each year:
date stock1 stock2
2010 (average of coefficients of different lags) (average of coefficients of different lags)
Has someone an idea to perform this in an effective way?
Thanks in advance!
Upvotes: 0
Views: 150
Reputation: 24732
You can set your dataframes to data.table, melt them both into long format, join them together, and run all the regressions (lead=0 through lead=5) using lapply
.
library(data.table)
setDT(prices)
setDT(expenses)
dt = melt(prices,id = "Date",variable.name = "stock", value.name = "price")[
melt(expenses,id="Date", variable.name="stock",value.name="expense"),
on=.(Date,stock)]
rbindlist(lapply(0:5, \(i) {
dt[, lm(price~shift(expense,n=i, type="lead"))$coef[2], by=stock]
}))[, .(avg_over_leads = mean(V1)), by=stock]
Output:
stock avg_over_leads
1: Stock1 0.13232853
2: Stock2 0.03691069
3: Stock3 0.13296958
Input:
set.seed(123)
prices = data.frame(
Date = 2000:2025,
Stock1 = rnorm(26),
Stock2 = rnorm(26),
Stock3 = rnorm(26)
)
expenses = data.frame(
Date = 2000:2025,
Stock1 = rnorm(26),
Stock2 = rnorm(26),
Stock3 = rnorm(26)
)
You can also do the same, using tidy approach:
dt = inner_join(
pivot_longer(prices,-Date,values_to = "price"),
pivot_longer(expenses,-Date,values_to="expense"),
by=c("Date","name")
)
bind_rows(
lapply(0:5, \(i) {
dt %>% group_by(name) %>%
summarize(V1 = lm(price~lead(expense,n=i))$coef[2])
})) %>%
group_by(name) %>%
summarize(avg_over_leads = mean(V1))
Upvotes: 1