somi gim
somi gim

Reputation: 17

can i make a simple regression from matrix?

data11 <- matrix(c(f11, p3, a3, b1, c1, d1), ncol = 6)
dimnames(data11) <- list(c('2015/16', '2016/17', '2017/18', '2018/19', '2019/20'), c('GPA', 'Sex', 'Fulltime', 'Indigenous', 'Co-op', 'International'))

I created a matrix from data.

        GPA      Sex       Fulltime  Indigenous  Co-op     International
2015/16 2.738711 0.1957311 0.5429625 0.008433362 0.4104236 0.2378208    
2016/17 2.799184 0.1922954 0.5640596 0.01018903  0.420968  0.2330071    
2017/18 2.842297 0.2017633 0.5600541 0.008940075 0.4422708 0.2392785    
2018/19 2.858647 0.2008524 0.5799423 0.007858447 0.4233421 0.2367674    
2019/20 NA       0.2011515 0.5712549 0.007988816 0.4156681 0.242161

And this is what I got.

I would like to predict the 2019/20 GPA by using simple linear regression. I tried to use lm() but it said this function cannot be used in matrix. I was about to convert to data frame, but data.frame function does not work and as.data.frame function cannot be installed.

I wonder whether there is any way that I can run the regression in matrix.

Upvotes: 0

Views: 28

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173898

The model is over-fitted since it contains more parameters than observations, so if you use all your parameters then your prediction will come with a "rank deficient" warning, but with this caveat, you can still get an estimate by:

predict(lm(GPA ~ ., data = as.data.frame(data11)), as.data.frame(data11)[5,])
#>  2019/20 
#> 2.843115 

This will effectively drop your last two columns as predictors, since your first three already give a perfect fit to the first 4 rows that cannot be improved with extra predictors.

Where data11 is:

data11 <- structure(list(GPA = c(2.738711, 2.799184, 2.842297, 2.858647, 
NA), Sex = c(0.1957311, 0.1922954, 0.2017633, 0.2008524, 0.2011515
), Fulltime = c(0.5429625, 0.5640596, 0.5600541, 0.5799423, 0.5712549
), Indigenous = c(0.008433362, 0.01018903, 0.008940075, 0.007858447, 
0.007988816), Co.op = c(0.4104236, 0.420968, 0.4422708, 0.4233421, 
0.4156681), International = c(0.2378208, 0.2330071, 0.2392785, 
0.2367674, 0.242161)), class = "data.frame", row.names = c("2015/16", 
"2016/17", "2017/18", "2018/19", "2019/20"))

data11
#>              GPA       Sex  Fulltime  Indigenous     Co.op International
#> 2015/16 2.738711 0.1957311 0.5429625 0.008433362 0.4104236     0.2378208
#> 2016/17 2.799184 0.1922954 0.5640596 0.010189030 0.4209680     0.2330071
#> 2017/18 2.842297 0.2017633 0.5600541 0.008940075 0.4422708     0.2392785
#> 2018/19 2.858647 0.2008524 0.5799423 0.007858447 0.4233421     0.2367674
#> 2019/20       NA 0.2011515 0.5712549 0.007988816 0.4156681     0.2421610

Created on 2020-12-06 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions