Reputation: 43
program:
import pandas as pd
ds=pd.read_csv('Animals.csv')
x=ds.iloc[:,1].values
y=ds.iloc[:,2].values
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)
x_train = x_train.reshape(-1, 1)
y_train = y_train.reshape(-1,1)
from sklearn.linear_model import LinearRegression as lr
reg=lr()
reg.fit(x_train,y_train)
y_pred=reg.predict(x_test)
y_pred = array([[433.34494686],
[433.20384407],
[418.6791427 ],
[433.34789435],
[407.49640802],
[432.25311216]])
y_test = array([[ 119.5],
[ 157. ],
[5712. ],
[ 56. ],
[ 50. ],
[ 680. ]])
the prediction is not perfect why? is that any problem with dataset or what it maybe? im new to machine learning thanks in advance
Upvotes: 2
Views: 80
Reputation: 88236
Well it really depends on what you are trying to predict and if the features you have are good predictors. So even though you are simply trying with a LR, if your target variable is explainable by the features you should get some reasonable accuracy metrics.
Looking at your y_test
you should consider removing outliers, which will probably improve the accuracy of the model.
You might also want to try with some more efficient regressors such as RandomForestRegressor or a SupportVectorRegressor.
Upvotes: 1