J W
J W

Reputation: 637

Multiple Linear Regression using Python

Firstly, there are a few topics on this but they involve deprecated packages with pandas etc. Suppose I'm trying to predict a variable w with variables x,y and z. I want to run a multiple linear regression to try and predict w. There are quite a few solutions that will produce the coefficients but I'm not sure how to use these. So, in pseudocode;

import numpy as np
from scipy import stats

w = np.array((1,2,3,4,5,6,7,8,9,10))  # Time series I'm trying to predict

x = np.array((1,3,6,1,4,6,8,9,2,2))   # The three variables to predict w
y = np.array((2,7,6,1,5,6,3,9,5,7)) 
z = np.array((1,3,4,7,4,8,5,1,8,2)) 

def model(w,x,y,z):
   # do something!

    return guess  # where guess is some 10 element array formed 
                  # using multiple linear regression of x,y,z

guess = model(w,x,y,z)
r = stats.pearsonr(w,guess) # To see how good guess is 

Hopefully this makes sense as I'm new to MLR. There is probably a package in scipy that does all this so any help welcome!

Upvotes: 1

Views: 3149

Answers (2)

Parth Verma
Parth Verma

Reputation: 820

You can use the normal equation method. Let your equation be of the form : ax+by+cz +d =w Then

import numpy as np

x = np.asarray([[1,3,6,1,4,6,8,9,2,2],
                [2,7,6,1,5,6,3,9,5,7],
                [1,3,4,7,4,8,5,1,8,2],
                [1,1,1,1,1,1,1,1,1,1]]).T
y = numpy.asarray([1,2,3,4,5,6,7,8,9,10]).T

a,b,c,d = np.linalg.pinv((x.T).dot(x)).dot(x.T.dot(y))

Upvotes: 1

J W
J W

Reputation: 637

Think I've found out now. If anyone could confirm that this produces the correct results that'd be great!

import numpy as np
from scipy import stats

# What I'm trying to predict
y = [-6,-5,-10,-5,-8,-3,-6,-8,-8]  

# Array that stores two predictors in columns
x = np.array([[-4.95,-4.55],[-10.96,-1.08],[-6.52,-0.81],[-7.01,-4.46],[-11.54,-5.87],[-4.52,-11.64],[-3.36,-7.45],[-2.36,-7.33],[-7.65,-10.03]])

# Fit linear least squares and get regression coefficients
beta_hat = np.linalg.lstsq(x,y)[0]
print(beta_hat)

# To store my best guess
estimate = np.zeros((9))

for i in range(0,9):

    # y = x1b1 + x2b2
    estimate[i] = beta_hat[0]*x[i,0]+beta_hat[1]*x[i,1]


# Correlation between best guess and real values
print(stats.pearsonr(estimate,y))

Upvotes: 0

Related Questions