Gill
Gill

Reputation: 3

Scatter Plot of predicted vs actual value with regression curve

I am trying to use scatter plots with regression curves using the following code. I am using different algorithms like Linear regression, SVM and Gaussian Process etc. I have tried different options for plotting the data mentioned below

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.gaussian_process import GaussianProcessRegressor


df=pd.read_excel(coded.xlsx)
dfnew=df[['FL','FW','TL','LL','KH']]

Y = df['KH']
X = df[['FL']]

X=X.values.reshape(len(X),1)
Y=Y.values.reshape(len(Y),1)

# Split the data into training/testing sets
X_train = X[:-270]
X_test = X[-270:]

# Split the targets into training/testing sets
Y_train = Y[:-270]
Y_test = Y[-270:]


#regressor = SVR(kernel = 'rbf')
#regressor.fit(X_train, np.ravel(Y_train))
#training the algorithm

regressor = GaussianProcessRegressor(random_state=42)
regressor.fit(X_train, Y_train)

y_pred = regressor.predict(X_test)

mse = np.sum((y_pred - Y_test)**2)


# root mean squared error
# m is the number of training examples
rmse = np.sqrt(mse/270)
print(rmse)

#X_grid = np.arange(min(X), max(X), 0.01) #this step required because data is feature scaled.
#X_grid = np.arange(0, 15, 0.01) #this step required because data is feature scaled.

#X_grid = X_grid.reshape((len(X_grid), 1))
#plt.scatter(X, Y, color = 'red')
print('size of Y_train = {0}'.format(Y_train.size))
print('size of y_pred = {0}'.format(y_pred.size))

#plt.scatter(Y_train, y_pred, color = 'red')
#plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
#plt.title('GPR')
#plt.xlabel('Measured')
#plt.ylabel('Predicted')
#plt.show()
fig, ax = plt.subplots(1, figsize=(12, 6))
plt.plot(X[:, 0], Y_train, marker='o', color='black', linewidth=0)
plt.plot(X[:, 0], y_pred, marker='x', color='steelblue')
plt.suptitle("$GaussianProcessRegressor(kernel=RBF)$ [default]", fontsize=20)
plt.axis('off')
pass

But I am getting error like: ValueError: x and y must have same first dimension, but have shapes (540,) and (270, 1)

What is the possible solution?

Upvotes: 0

Views: 937

Answers (1)

Bill the Lizard
Bill the Lizard

Reputation: 405745

This code splits X and Y into training/testing sets, but then tries to plot a column from all of X with Y_train and y_pred, which have only half as many values as X. Try creating plots with X_train and X_test instead.

Upvotes: 1

Related Questions