leakie
leakie

Reputation: 1

Scatter plot showing data points with negative values when there are no negative values in the dataset

I have an issue with this linear regression model. The scatter plot shows data points well into the negative when no negative values are within the data set. I've checked the shapes and minimum values and the graph should not be showing these negative values but I cannot figure out why the scatter plot suggests they are present.

Code for the metrics definition:

def evaluate_model(y_test, price_pred):
    gradient = price_linear.coef_
    intercept = price_linear.intercept_

    mile_mae = mean_absolute_error(y_test, price_pred)
    mile_mse = mean_squared_error(y_test, price_pred)
    mile_rmse = np.sqrt(mile_mse)
    mile_r2 = r2_score(y_test, price_pred)

    print(f'Gradient: {gradient}\n')
    print(f' Intercept: {intercept}')

    print(f' Mean absolute error: {mile_mae})')
    print(f' Mean squared error: {mile_mse}')
    print(f' Root mean squared error: {mile_rmse}')
    print(f' Coefficient of determination: {mile_r2}')

Code for the linear regression model

numerical_inputs = ['Mileage', 'Year of manufacture', 'Engine size']

x = df[numerical_inputs]
y = df['Price']

# splitting of the data 
x_num_train, x_num_test, y_price_train, y_price_test =
train_test_split(x, y, test_size
=0.2, random_state=42)

# scaling the numerical data 
scale = StandardScaler()

# fitting only to train data to prevent data leakage 
scale.fit(x_num_train)
num_train_scaled = scale.transform(x_num_train)
num_test_scaled = scale.transform(x_num_test)

multi_price_linear = LinearRegression()

multi_price_linear.fit(num_train_scaled, y_price_train)

multi_price_pred = multi_price_linear.predict(num_test_scaled)

evaluate_model(y_price_test, multi_price_pred)

# plt.show()
plt.figure(figsize=(14, 8))
plt.scatter(y_price_test, multi_price_pred, alpha=0.6)
plt.plot([min(y_price_test), max(y_price_test)],
         [min(y_price_test), max(y_price_test)], color='red')
plt.ylabel('Actual Price')
plt.xlabel('Predicted Price')
plt.title('Predicted Price vs Actual Price')
plt.show()

Which results in the following output:

Gradient: [-2720.41736808  9520.41488938  6594.02448017] 
 Intercept: 13854.628699999997

 Mean absolute error: 6091.458141656242 
 Mean squared error: 89158615.76017143 
 Root mean squared error: 9442.38400829851 
 Coefficient of determination: 0.671456306417368

Here is an image of the scatter plot:

Scatter plot with negative values

I don't want to limit the graph to showing the negative values if this indicates some issue with the data or code. Thank you! Here you can find the full version of my code google code

Upvotes: 0

Views: 53

Answers (1)

Sheer Wolff
Sheer Wolff

Reputation: 11

The answer is you accidentally switched the axes labels. Your predicted values are plotted on the Y axis and your actual values are plotted on the X axis.

Upvotes: 1

Related Questions