Ben
Ben

Reputation: 75

Loading CSV to Scikit Learn

I'm new to python but I'm trying to run a regression with a bunch of different variables. So far I've got it down to Scikit. I've been searching for hours but can't seem to find a way to import the data and then run a linear regression on it while returning the coefficients of each variable. Any help is much appreciated. I have 15 columns that I want to run against the X.

X = Margin
Ys = A1, B1, C1, D1 etc. 

Example set below:

Margin,A1
-8,110.7
-10,112
-1,106.7
9,109
-9,107.5
1,108.1
-19,109.2

Here's what I've got so far I know it's not much

import pandas as pd

data = pd.read_csv("NBA.csv")

Upvotes: 0

Views: 332

Answers (1)

Y.P
Y.P

Reputation: 355

As a convention in machine learning we consider X as the features and Y as the target.

If you want to run a linear regression and extract the coefficients, you can do the following :

# import the needed libraries
import pandas as pd
from sklearn.linear_model import LinearRegression

# Import the data
data = pd.read_csv("NBA.csv")

# Specify the features and the target
target = 'Margin'
features = data.columns.tolist() # This is the column names of your data as a list
features.remove(target) # We remove the target from the list of features

# Train the model
model = LinearRegression() # Instantiate the model
model.fit(data[features].values, data[target].values) # fit the model to the data
print(features) # Returns the name of each feature
print(model.coef_) # Returns the coefficients for each feature (in the same order of your features)

Upvotes: 3

Related Questions