Raghavan vmvs
Raghavan vmvs

Reputation: 1265

solve linear equations in many variables using R

I have the following dataframe(Note. My sample has over a 100 columns and rows of a hundred)

  word1 word2   word3   word4   word5   Score
   1    1        1       1       1        10
   1    2        3       4       5        16
   2    1        0       1       2        13
   1    1        1       1       1        15
   1    2        3       4       5        16
   2    1        0       1       2        18
   1    1        1       1       1        10
   1    2        3       4       5        16
   2    1        0       1       2        13
   1    1        1       1       1        15
   1    2        3       4       5        16
   2    1        0       1       2        18
   1    1        1       1       1        10
   1    2        3       4       5        16
   2    1        0       1       2        13
   1    1        1       1       1        15
   1    2        3       4       5        16
   2    1        0       1       2        18

This is a system of linear equations in many variables. I want to solve the same and get the actual values of word1, word2, word3, word4, etc. Score is predicetd by word1,word2, word3 etc

I have used

  lm(Score~., data=DF)

This gives NA values and a few values. I request some help here. Many thanks in advance. is there a reason for the NA values. And is there an alternate approach

Upvotes: 0

Views: 230

Answers (1)

kangaroo_cliff
kangaroo_cliff

Reputation: 6222

fit <- lm(Score ~ ., data = df)
fit

#Call:
#lm(formula = Score ~ ., data = df)

#Coefficients:
#(Intercept)        word1        word2        word3        word4        word5  
#        6.0          3.0          3.5           NA           NA           NA

If this is what happens, it must be due to the multi-colinearities in your data. When data has multi-colinearity, lm is not able to give a unique solution unless it drops some of the variables.

In your case, it is easy to see the presence of multi-colinearities; see below. The word2 and word4 pair are perfectly correlated. There are a few other high-correlation coefficients, too. (NOTE: cor is not the best way to check for multi-colinearities, as it only checks pair-wise correlations.)

round(cor(df), 2)
#       word1 word2 word3 word4 word5 Score
# word1  1.00 -0.50 -0.76 -0.50 -0.28  0.23
# word2 -0.50  1.00  0.94  1.00  0.97  0.37
# word3 -0.76  0.94  1.00  0.94  0.84  0.19
# word4 -0.50  1.00  0.94  1.00  0.97  0.37
# word5 -0.28  0.97  0.84  0.97  1.00  0.47
# Score  0.23  0.37  0.19  0.37  0.47  1.00 

Upvotes: 2

Related Questions