Abhinav Sharma
Abhinav Sharma

Reputation: 3

How to run linear regression model for each industry-year excluding firm i observations in R?

Here is the dput output of my dataset in R......

data1<-structure(list(Year = c(1998, 1999, 1999, 2000, 1996, 2001, 1998, 
1999, 2002, 1998, 2005, 1998, 1999, 1998, 1997, 1998, 2000), 
    `Firm name` = c("A", "A", "B", "B", "C", "C", "D", "D", "D", 
    "E", "E", "F", "F", "G", "G", "H", "H"), Industry = c("AUTO", 
    "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", 
    "Pharma", "Pharma", "Pharma", "Pharma", "Pharma", "Pharma", 
    "Pharma", "Pharma"), X = c(1, 2, 5, 6, 7, 9, 10, 11, 12, 
    13, 15, 16, 17, 18, 19, 20, 21), Y = c(30, 31, 34, 35, 36, 
    38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50), Z = c(23, 
    29, 47, 53, 59, 71, 77, 83, 89, 95, 107, 113, 119, 125, 131, 
    137, 143)), row.names = c(NA, -17L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`1` = 1L), class = "omit"))
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50), Z = c(23, 
29, 35, 41, 47, 53, 59, 65, 71, 77, 83, 89, 95, 101, 107, 113, 
119, 125, 131, 137, 143)), row.names = c(NA, -21L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`1` = 1L), class = "omit"))

Here I am trying to regress Y~ X+Z for each industry year but excluding firm i observations.For each firm I want to estimate the linear regression model using all industry peer firms' observations but excluding firm's own observations.For example;for firm A, I want to regress Y~ X+Z by using all observations of its industry peer firms (B,C & D) across time but excluding firm A observations. Similarly I want to run model for firm B by using all observations of firm A,C & D (part of same industry as B) across time excluding firm B observations. And same procedure for firm C & D as well. I want to do this exercise for every firm within each industry. Please help.

Upvotes: 0

Views: 521

Answers (2)

Ben
Ben

Reputation: 30494

As mentioned by @bonedi you can use a nested loop to accomplish this. If you want to create models for individual industry-year combinations, you will need to subset your data by Industry and Year. You can loop over Firm name and exclude that firm before creating the model. Results can be stored in a list, named by industry-year-firm. It's not a pretty solution but it should get you closer.

lst <- list()

for (ind in unique(data1$Industry)) {
  for (year in unique(data1[data1$Industry == ind, ]$Year)) {
    for (firm in unique(data1[data1$Industry == ind & data1$Year == year, ]$`Firm name`)) {
      sub_data <- data1[data1$Industry == ind & data1$Year == year & data1$`Firm name` != firm, ]
      if (nrow(sub_data) > 0) {
        name <- paste(ind, year, firm, sep = '-')
        lst[[name]] <- lm(Y ~ X + Z, data = sub_data)
      }
    }
  }
}

Upvotes: 1

bonedi
bonedi

Reputation: 11

The displayed code isn't nice to read. But from what you write, I'd recommend a nested loop, e.g:

for(y in year){
    for(comp in FirmName){
      # transform data : select only companys in this industry, but exclude comp
       lm(..)
     }
 }

Upvotes: 0

Related Questions