Loop paneldata regression over list of x and y variables

Question

I am working with paneldata and wish to conduct a series of bivariate fixed effects regressions from a list of variables.

A small part of my dataset looks like this:

library(plm)
library(dplyr)

structure(list(v2x_libdem = c(0.876, 0.88, 0.763, 0.779), v2x_partipdem = c(0.679, 
0.68, 0.647, 0.652), v2x_frassoc_thick = c(0.937, 0.937, 0.9, 
0.899), country_name = c("Sweden", "Sweden", "Hungary", "Hungary"
), year = c(2000, 2008, 2000, 2008)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 4 × 5
  v2x_libdem v2x_partipdem v2x_frassoc_thick country_name  year
                                      
1      0.876         0.679             0.937 Sweden        2000
2      0.88          0.68              0.937 Sweden        2008
3      0.763         0.647             0.9   Hungary       2000
4      0.779         0.652             0.899 Hungary       2008

My list of variables looks something like this:

vars <- c("v2x_libdem", "v2x_partipdem", "v2x_frassoc_thick")

These variables ought to be both x and y, and thus every combination of the variables as x and y should be in a regression.

I have written a function that conducts the paneldata modelling:

paneldata_function <- function (y, x) {     #this function will do a PLM 
  plm(paste(y, "~", x), 
      data = Vdem_west,
      model = "within",
      index = c("year", "country_name"))   #for a given x & y variable
}

It is this function I need to loop over with each combination of x and y. Preferably, I would want the output to a be four vectors of values which I might turn into a dataset; one for the coefficient, one for the x variable, one for the y variable and one for the p-value.

If I do one single model, I can access these easily:

e <- paneldata_function("v2x_libdem", "v2x_partipdem")

p_value <- e[["fstatistic"]][["p.value"]]

x_var <- names(e[["model"]])[2]

y_var <- names(e[["model"]])[1]

How might I write the loop so that I get these vectors as an output?

Any help is very appreciated, and I hope I phrased my question as clearly as possible.

Kat · Accepted Answer

To loop through the possible fields, first, you need something that gives you that information. This function will work regardless of your dataset size, you'll have to adjust what fields to keep, though. (For example, here you needed to exclude the last two fields, so I select 1:3. There are a lot of ways to make this list, but many make duplicates (or the same two fields, but one time is left-right and the next is right-left). This method uses the package RcppAlgos. The 2 is for how many variables you would like. The F or FALSE renders the return unique pairs only.

ndf = names(Vdem_west)   
cbo = RcppAlgos::comboGeneral(ndf[1:3], 2, F) # make a list of possible combinations
#      [,1]            [,2]               
# [1,] "v2x_libdem"    "v2x_partipdem"    
# [2,] "v2x_libdem"    "v2x_frassoc_thick"
# [3,] "v2x_partipdem" "v2x_frassoc_thick"

Then using your function as it is now, I call it in this map call. map is in tidyverse.

# run all the models; store results in list of lists
res = map(1:nrow(cbo),
          ~paneldata_function(cbo[.x, 1], cbo[.x, 2]))

# name the models
names(res) <- paste0(cbo[, 1], "_", cbo[, 2])
names(res) # check it
# [1] "v2x_libdem_v2x_partipdem"        "v2x_libdem_v2x_frassoc_thick"   
# [3] "v2x_partipdem_v2x_frassoc_thick"

For the coefficient data, I wrote the output to a data frame. You didn't ask for the model name, but I added that as well.

map_dfr(1:length(res),
        .f = function(x){
          estP = summary(res[[x]])$coefficients[c(1, 4)]
          coef = dimnames(summary(res[[x]])$coefficients)[1]
          model = names(res)[x]
          c(Coefficient = coef, Est = round(estP[1], 4), 
            PV = round(estP[2], 4), Model = model)
        })
# # A tibble: 3 × 4
#   Coefficient         Est     PV Model                          
#                                             
# 1 v2x_partipdem     3.56  0.0067 v2x_libdem_v2x_partipdem       
# 2 v2x_frassoc_thick 2.85  0.0441 v2x_libdem_v2x_frassoc_thick   
# 3 v2x_frassoc_thick 0.799 0.0509 v2x_partipdem_v2x_frassoc_thick

I also put together a call, so that you could look at the model performance, as well. This could be combined with the previous call, but you didn't ask for this part. (I actually did this part first and realized I wasn't answering your question.)

map_dfr(1:length(res),
        .f = function(x){
          R2 = summary(res[[x]])[["r.squared"]][["rsq"]] %>% round(4)
          Adj.R2 = summary(res[[x]])[["r.squared"]][["adjrsq"]] %>% round(4)
          PV = summary(res[[x]])[["fstatistic"]][["p.value"]] %>% round(4)
          Model = names(res)[x]
          c(R2 = R2, Adj.R2 = Adj.R2, PV = PV, Model = Model)
        })
# # A tibble: 3 × 4
#   R2     Adj.R2 PV.F   Model                          
#                                   
# 1 0.9999 0.9997 0.0067 v2x_libdem_v2x_partipdem       
# 2 0.9952 0.9856 0.0441 v2x_libdem_v2x_frassoc_thick   
# 3 0.9936 0.9809 0.0509 v2x_partipdem_v2x_frassoc_thick

You could use this same approach to test your model and look at the assumptions.

Loop paneldata regression over list of x and y variables

Answers (1)

Related Questions