Reputation: 1
I have an issue with this linear regression model. The scatter plot shows data points well into the negative when no negative values are within the data set. I've checked the shapes and minimum values and the graph should not be showing these negative values but I cannot figure out why the scatter plot suggests they are present.
Code for the metrics definition:
def evaluate_model(y_test, price_pred):
gradient = price_linear.coef_
intercept = price_linear.intercept_
mile_mae = mean_absolute_error(y_test, price_pred)
mile_mse = mean_squared_error(y_test, price_pred)
mile_rmse = np.sqrt(mile_mse)
mile_r2 = r2_score(y_test, price_pred)
print(f'Gradient: {gradient}\n')
print(f' Intercept: {intercept}')
print(f' Mean absolute error: {mile_mae})')
print(f' Mean squared error: {mile_mse}')
print(f' Root mean squared error: {mile_rmse}')
print(f' Coefficient of determination: {mile_r2}')
Code for the linear regression model
numerical_inputs = ['Mileage', 'Year of manufacture', 'Engine size']
x = df[numerical_inputs]
y = df['Price']
# splitting of the data
x_num_train, x_num_test, y_price_train, y_price_test =
train_test_split(x, y, test_size
=0.2, random_state=42)
# scaling the numerical data
scale = StandardScaler()
# fitting only to train data to prevent data leakage
scale.fit(x_num_train)
num_train_scaled = scale.transform(x_num_train)
num_test_scaled = scale.transform(x_num_test)
multi_price_linear = LinearRegression()
multi_price_linear.fit(num_train_scaled, y_price_train)
multi_price_pred = multi_price_linear.predict(num_test_scaled)
evaluate_model(y_price_test, multi_price_pred)
# plt.show()
plt.figure(figsize=(14, 8))
plt.scatter(y_price_test, multi_price_pred, alpha=0.6)
plt.plot([min(y_price_test), max(y_price_test)],
[min(y_price_test), max(y_price_test)], color='red')
plt.ylabel('Actual Price')
plt.xlabel('Predicted Price')
plt.title('Predicted Price vs Actual Price')
plt.show()
Which results in the following output:
Gradient: [-2720.41736808 9520.41488938 6594.02448017]
Intercept: 13854.628699999997
Mean absolute error: 6091.458141656242
Mean squared error: 89158615.76017143
Root mean squared error: 9442.38400829851
Coefficient of determination: 0.671456306417368
Here is an image of the scatter plot:
I don't want to limit the graph to showing the negative values if this indicates some issue with the data or code. Thank you! Here you can find the full version of my code google code
Upvotes: 0
Views: 53
Reputation: 11
The answer is you accidentally switched the axes labels. Your predicted values are plotted on the Y axis and your actual values are plotted on the X axis.
Upvotes: 1