When Scikit linear models return negative value for score?

Question

I'm new in machine learning, and trying to implement linear model estimators that provide Scikit to predict price of the used car. I used different combinations of linear models, like LinearRegression, Ridge, Lasso and Elastic Net, but all of them in most cases return negative score (-0.6 <= score <= 0.1).

Someone told me that this is because of multicollinearity problem, but I don't know how to solve it.

My sample code:

import numpy as np
import pandas as pd
from sklearn import linear_model
from sqlalchemy import create_engine
from sklearn.linear_model import Ridge

engine = create_engine('sqlite:///path-to-db')

query = "SELECT mileage, carcass, engine, transmission, state, drive, customs_cleared, price FROM cars WHERE mark='some mark' AND model='some model' AND year='some year'"
df = pd.read_sql_query(query, engine)
df = df.dropna()
df = df.reindex(np.random.permutation(df.index))

X_full = df[['mileage', 'carcass', 'engine', 'transmission', 'state', 'drive', 'customs_cleared']]
y_full = df['price']

n_train = -len(X_full)/5
X_train = X_full[:n_train]
X_test = X_full[n_train:]
y_train = y_full[:n_train]
y_test = y_full[n_train:]

predict = [200000, 0, 2.5, 0, 0, 2, 0] # parameters of the car to predict

model = Ridge(alpha=1.0)
model.fit(X_train, y_train)
y_estimate = model.predict(X_test)

print("Residual sum of squares: %.2f" % np.mean((y_estimate - y_test) ** 2))
print("Variance score: %.2f" % model.score(X_test, y_test))
print("Predicted price: ", model.predict(predict))

Carcass, state, drive and customs cleared are numeric and represent types.

What is correct way to implement prediction? Maybe some data preprocessing or different algorithm.

Thanks for any advance!

When Scikit linear models return negative value for score?

Answers (1)

Related Questions