Reputation: 75
I'm new to python but I'm trying to run a regression with a bunch of different variables. So far I've got it down to Scikit. I've been searching for hours but can't seem to find a way to import the data and then run a linear regression on it while returning the coefficients of each variable. Any help is much appreciated. I have 15 columns that I want to run against the X.
X = Margin
Ys = A1, B1, C1, D1 etc.
Example set below:
Margin,A1
-8,110.7
-10,112
-1,106.7
9,109
-9,107.5
1,108.1
-19,109.2
Here's what I've got so far I know it's not much
import pandas as pd
data = pd.read_csv("NBA.csv")
Upvotes: 0
Views: 332
Reputation: 355
As a convention in machine learning we consider X as the features and Y as the target.
If you want to run a linear regression and extract the coefficients, you can do the following :
# import the needed libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
# Import the data
data = pd.read_csv("NBA.csv")
# Specify the features and the target
target = 'Margin'
features = data.columns.tolist() # This is the column names of your data as a list
features.remove(target) # We remove the target from the list of features
# Train the model
model = LinearRegression() # Instantiate the model
model.fit(data[features].values, data[target].values) # fit the model to the data
print(features) # Returns the name of each feature
print(model.coef_) # Returns the coefficients for each feature (in the same order of your features)
Upvotes: 3