Reputation: 1367
I got this script, that predict with RandomForest and LinearRegression the values for the seconds dataset.That works ok, the accuracy for the linear regression is 18% , too bad.
So Im trying with RandomForest, but I dont know how to calculate the accuracy of that model..
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
from pylab import rcParams
import urllib
import sklearn
from sklearn.linear_model import RidgeCV, LinearRegression, Lasso
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.model_selection import GridSearchCV
data = pd.read_csv('EncuestaVieja.csv')
X = data[['Edad','Sexo','v1','v2','v3']]
y = data['Alumna']
dataP = pd.read_csv('EncuestaVieja_test.csv')
X_p = dataP[['Edad','Sexo','v1','v2','v3']]
y_p = dataP['Alumna']
dataT = pd.read_csv('EncuestaVieja_test_2.csv')
X_t = dataT[['Edad','Sexo','v1','v2','v3']]
y_t = dataT['Alumna']
regr = linear_model.LinearRegression()
regr.fit(X, y)
lr = RandomForestRegressor(n_estimators=50)
lr.fit(X, y)
X_test = pd.read_csv('EncuestaNueva.csv')[['Edad','Sexo','v1','v2','v3']]
predictions = regr.predict(X_test)
predictions2 = lr.predict(X_test)
print( 'RandomForest Accuracy: ')
print(((predictions2)))
print( '')
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_p,y_p)
accuracy = regressor.score(X_t,y_t)
print( 'Linear Regression Accuracy: ', accuracy*100,'%')
print(((predictions)))
OUTPUT:
RandomForest Accuracy:
[ 1.64 2.54 2.6 2.38 1.64 1.32 1.68 2.56 3. 2.28 2.38 2.68
2.9 2.5 2.78 1.96 1.56 2.6 2.12 2.76 2.74 1.66 1.68 2.12
2.3 2.36 2.28 2.28 2.82 1.7 1.86 2.36 1.24]
Linear Regression Accuracy: 18.1336149086 %
[ 1.2681851 1.02802219 3.13377072 2.96885127 2.30808853 1.98814349
2.39233726 2.8638321 1.86640316 2.63073399 2.21166731 2.25201016
1.95065189 2.65360517 3.08855254 1.01229733 2.18225606 2.41802534
2.43539473 2.50227407 1.71105799 1.88238089 2.12152321 3.33525397
2.72820453 2.43241713 2.88757874 2.6242382 2.63087916 1.98379487
2.25430306 1.96810279 0.8554685 ]
Upvotes: 3
Views: 22776
Reputation: 312
I think this is handled with the score() method
lr.score(x_test, y_test)
This will return the R^2 value for your model. It looks like in your case you only have an x_test though. Note that this is not the accuracy. Regression models do not use accuracy like classification models. Instead different metrics are computed such as, mean square error or coefficient of determination. These metrics can show how accurately predicted values match known values or how closely a regression model fits a regression line.
Upvotes: 2