Fi-Stat
Fi-Stat

Reputation: 3

Using combinations of principal components in a regression model

I have a group of 51 variables into which I have applied Principal Component Analysis and selected six factors based on the Kaiser-Guttman criterion. I'm using R for my analysis and did this with the following function:
prca.searchwords <- prcomp(searchwords.ts, scale=TRUE) summary(prca.searchwords) prca.searchwords$sdev^2

Next I would like to use these six extracted factors in a dynamic linear regression model as explanatory variables in groups of one, two, three & four and choose the regression model that explains most of the variation of the dependent variable. The six variables are prca.searchwords$x[,1] + prca.searchwords$x[,2] + prca.searchwords$x[,3] + prca.searchwords$x[,4] + prca.searchwords$x[,5] + prca.searchwords$x[,6]

Which I convert to time series before using in a regression:
prca.searchwords.1.ts <- ts(data=prca.searchwords$x[,1], freq=12, start=c(2004, 1)) prca.searchwords.2.ts <- ts(data=prca.searchwords$x[,2], freq=12, start=c(2004, 1))

I'm using the dynlm package in R for this (I chose to use dynamic regression because other regressions that I perform require lagged values of the independent variables).

For example with the first two factors it would look like this:
private.consumption.searchwords.dynlm <- dynlm(monthly.privateconsumption.ts ~ prca.searchwords.1.ts + prca.searchwords.2.ts) summary(private.consumption.searchwords.dynlm)

The problem I'm facing is that I would like to do this for all possible combinations of one, two, three and four factors of those six factors that I have chosen to use. This would mean that I would have to do six regressions for 1 variable groups, 15 for two variables, 20 for three variables and 15 for four variables. I would like to do this as efficiently as possible, without having to type 51 different regressions manually.

I'm a relatively new R user and therefore I still struggle with these general coding tricks that will radically speed up my analysis. Could someone please point me into the right direction?

Thank you!

Upvotes: 0

Views: 531

Answers (1)

MrFlick
MrFlick

Reputation: 206197

You could build all the formula you are intereted in running using string manipulation functions then convert those to propert formuals and apply over the list of models you want to run. For example

vars <- paste0("prca.searchwords.",1:6,".ts")

resp <- unlist(lapply(1:6, function(i) apply(combn(vars,i), 2, paste, collapse=" + ")))

result <- lapply(resp, function(r) {
    do.call("dynlm", list(as.formula(paste0("monthly.privateconsumption.ts ~ ", r))))
})

Upvotes: 3

Related Questions