Ramon
Ramon

Reputation: 81

Multiple regressions with lag in R

I would like to perform multiple rolling regression for different variables. My data is structured as follows:

Prices: 
Date Stock1 Stock2 Stock3 Stocki
1990
...
2000 10000  1200   13000  900
2001 90000  1300   12800  920
2002 98000  1280   11900  932
2003 ...
2004 ...
...
2020

The second dataframe with the variable I want to regress has the same structure:

Expenses:
Date Stock1 Stock2 Stock3 Stocki
1990
...
2000 5      12       8      18 
2001 3      13       10     20
2002 2      12       19     32
2003 ...
2004 ...
...
2020

I want to regress each Stock price on a rolling basis with a backwindow of 8 years. The code to calculate for example the coefficient of year 2010 would be:

lm(prices$stock1(2003:2010) ~ expenses$stock1(2002:2009)

Furthermore, my overall aim is to perform for each stock 5 rolling regression with different lags from 1 to 5. Again the code for the coefficient with lag in year 2010 would look as follows:

lag1:

lm(prices$stock1(2003:2010) ~ expenses$stock1(2001:2008)

lag2:

lm(prices$stock1(2003:2010) ~ expenses$stock1(2000:2007)

and so on...

Lastly, I want to calculate the simple average of the coefficients of the five regression performed for each stock in each year with the different lags.

My desired output would look as follows for each year:

date   stock1                                         stock2
2010   (average of coefficients of different lags)    (average of coefficients of different lags)

Has someone an idea to perform this in an effective way?

Thanks in advance!

Upvotes: 0

Views: 150

Answers (1)

langtang
langtang

Reputation: 24732

You can set your dataframes to data.table, melt them both into long format, join them together, and run all the regressions (lead=0 through lead=5) using lapply.

library(data.table)

setDT(prices)
setDT(expenses)

dt = melt(prices,id = "Date",variable.name = "stock", value.name = "price")[
  melt(expenses,id="Date", variable.name="stock",value.name="expense"),
  on=.(Date,stock)]

rbindlist(lapply(0:5, \(i) {
    dt[, lm(price~shift(expense,n=i, type="lead"))$coef[2], by=stock]
}))[, .(avg_over_leads = mean(V1)), by=stock]

Output:

    stock avg_over_leads
1: Stock1     0.13232853
2: Stock2     0.03691069
3: Stock3     0.13296958

Input:

set.seed(123)
prices = data.frame(
  Date = 2000:2025,
  Stock1 = rnorm(26),
  Stock2 = rnorm(26),
  Stock3 = rnorm(26)
)
expenses = data.frame(
  Date = 2000:2025,
  Stock1 = rnorm(26),
  Stock2 = rnorm(26),
  Stock3 = rnorm(26)
)

You can also do the same, using tidy approach:

dt = inner_join(
  pivot_longer(prices,-Date,values_to = "price"),
  pivot_longer(expenses,-Date,values_to="expense"),
  by=c("Date","name")
)
bind_rows(
  lapply(0:5, \(i) {
  dt %>% group_by(name) %>% 
    summarize(V1 = lm(price~lead(expense,n=i))$coef[2])
})) %>% 
  group_by(name) %>% 
  summarize(avg_over_leads = mean(V1))

Upvotes: 1

Related Questions