Create a prediction column using Betas in r

let's say I have a vector of Betas (coefficients of a regression) like this:

> ResultDos$coefficients[-1]
                            WOE_PPIExFoodEnergyTradeMoM 
                                               7.371144 
                        WOE_ChangeinHouseholdEmployment 
                                              13.089279 
      WOE_RetailSalesExAutoMoM.WOE_RetailSalesExAutoMoM 
                                              23.082144 
                                             WOE_PPIMoM 
                                              12.757599 
                                  WOE_PPIFinalDemandYoY 
                                               7.790093 
                    WOE_PPIMoM.WOE_RetailSalesExAutoMoM 
                                             -27.627089 
                               WOE_ExistingHomeSalesMoM 
                                              14.189292 
   WOE_ExistingHomeSalesMoM.WOE_PPIExFoodEnergyTradeMoM 
                                             -44.579969 

And the class is "Numeric"

Also, I have a Data frame which contains those columns and some more (the name of the columns are variable, that's why I don't use a fixed multiplication)

> head(OutputData,10)
                  Date OutputTrainData.Dependent
1  2013-01-01 22:00:00                       -18
2  2013-01-02 22:00:00                      -137
3  2013-01-03 22:00:00                        20
4  2013-01-04 22:00:00                        48
5  2013-01-07 22:00:00                       -36
6  2013-01-08 22:00:00                       -17
7  2013-01-09 22:00:00                       208
8  2013-01-10 22:00:00                        71
9  2013-01-11 22:00:00                        39
10 2013-01-14 22:00:00                       -76
   WOE_ExistingHomeSalesMoM.WOE_ExistingHomeSalesMoM
1                                          0.4179244
2                                          0.4179244
3                                          0.4179244
4                                          0.4179244
5                                          0.4179244
6                                          0.4179244
7                                          0.4179244
8                                          0.4179244
9                                          0.4179244
10                                         0.4179244
   WOE_RetailSalesExAutoMoM.WOE_RetailSalesExAutoMoM WOE_ChangeinHouseholdEmployment
1                                          0.6000675                      -0.8284745
2                                          0.6000675                      -0.8284745
3                                          0.6000675                      -0.8284745
4                                          0.6000675                       0.3242050
5                                          0.6000675                       0.3242050
6                                          0.6000675                       0.3242050
7                                          0.6000675                       0.3242050
8                                          0.6000675                       0.3242050
9                                          0.6000675                       0.3242050
10                                         0.6000675                       0.3242050
   WOE_ExistingHomeSalesMoM WOE_PPIExFoodEnergyTradeMoM WOE_PPIFinalDemandYoY  WOE_PPIMoM
1                 0.6464707                  -0.0820543              0.371575 -0.82847453
2                 0.6464707                  -0.0820543              0.371575 -0.82847453
3                 0.6464707                  -0.0820543              0.371575 -0.47707664
4                 0.6464707                  -0.0820543              0.371575 -0.16578655
5                 0.6464707                  -0.0820543              0.371575 -0.47707664
6                 0.6464707                  -0.0820543              0.371575  0.09306556
7                 0.6464707                  -0.0820543              0.371575  0.09306556
8                 0.6464707                  -0.0820543              0.371575  0.09306556
9                 0.6464707                  -0.0820543              0.371575 -0.20432022
10                0.6464707                  -0.0820543              0.371575 -0.20432022
   WOE_ExistingHomeSalesMoM.WOE_PPIExFoodEnergyTradeMoM
1                                            -0.0530457
2                                            -0.0530457
3                                            -0.0530457
4                                            -0.0530457
5                                            -0.0530457
6                                            -0.0530457
7                                            -0.0530457
8                                            -0.0530457
9                                            -0.0530457
10                                           -0.0530457
   WOE_ManufacturingSICProduction.WOE_RetailSalesExAutoMoM WOE_PPIMoM.WOE_RetailSalesExAutoMoM
1                                               -0.4889554                          0.64176968
2                                               -0.4889554                          0.64176968
3                                               -0.4889554                          0.36956275
4                                               -0.4889554                          0.12842493
5                                               -0.4889554                          0.36956275
6                                               -0.4889554                         -0.07209233
7                                               -0.4889554                         -0.07209233
8                                               -0.4889554                         -0.07209233
9                                               -0.4889554                          0.15827466
10                                              -0.4889554                          0.15827466

What I would like to do, is to create a new column "Fits" that multiplies the value of the data frame by the value of the Betas, when the names of the column/betas matches. Can anyone help me?

For proof of concept, in an easier way to explain it would be something like this:

Vector: (x1 = 10, x2 = 5, x3 = 1) DF:

Day   x3    x2    x1
1     5      3    2
2     2      1    2
3     1      5    3

Output:

Day   x3    x2    x1   Fits
1     5      3    2     (5*1+3*5+2*10) = 40
2     2      1    2       27
3     1      5    3       56

SOLVED --

To solve this, I did the following (not the best solution as I'm new to R / coding):

1.- Get the Betas Vector in order with

Orderlist <- sapply(names(OutputData[-c(1:2)]), function(x) which(x==names(ResultDos$coefficients[-1])))
Orderlist <- as.vector(Orderlist)
BetasInOrder <- as.vector(Betas[Orderlist])

2.- Convert data into a matrix so I could do a Matrix Multiplication.

m <- as.matrix(OutputData[-c(1:2)])
Fits <- m%*%diag(BetasInOrder)

3.- Sum columns and add the intercept

FitsValue <- rowSums(Fits)
FitsValue <- FitsValue + ResultDos$coefficients[1]

Upvotes: 0

Views: 167

Answers (1)

DanY
DanY

Reputation: 6073

Two options: (1) use the predict command, or (2) do X %*% beta where you select the correct columns of your data to use in X using e.g. which. Note the need for cbind because of the intercept in the regression.

# example data
set.seed(1234)
df <- data.frame(
    x1 = runif(100, 0, 10),
    x2 = runif(100, 0, 10),
    x3 = runif(100, 0, 10)
)
df$y <- 2 + 1*df$x1 + 3*df$x3 + rnorm(100, 0, 5)

# run regression of y on x1 and x3 (but not x2)
out <- lm(y ~ x1 + x3, data=df)

# option 1: use predict command
pred1 <- predict(out)

# option 2: use X %*% beta
X <- cbind(1, df[ , names(df) %in% names(out$coefficients)])
pred2 <- as.matrix(X) %*% coef(out)

Upvotes: 2

Related Questions