Reputation: 3
I am trying to use scatter plots with regression curves using the following code. I am using different algorithms like Linear regression, SVM and Gaussian Process etc. I have tried different options for plotting the data mentioned below
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.gaussian_process import GaussianProcessRegressor
df=pd.read_excel(coded.xlsx)
dfnew=df[['FL','FW','TL','LL','KH']]
Y = df['KH']
X = df[['FL']]
X=X.values.reshape(len(X),1)
Y=Y.values.reshape(len(Y),1)
# Split the data into training/testing sets
X_train = X[:-270]
X_test = X[-270:]
# Split the targets into training/testing sets
Y_train = Y[:-270]
Y_test = Y[-270:]
#regressor = SVR(kernel = 'rbf')
#regressor.fit(X_train, np.ravel(Y_train))
#training the algorithm
regressor = GaussianProcessRegressor(random_state=42)
regressor.fit(X_train, Y_train)
y_pred = regressor.predict(X_test)
mse = np.sum((y_pred - Y_test)**2)
# root mean squared error
# m is the number of training examples
rmse = np.sqrt(mse/270)
print(rmse)
#X_grid = np.arange(min(X), max(X), 0.01) #this step required because data is feature scaled.
#X_grid = np.arange(0, 15, 0.01) #this step required because data is feature scaled.
#X_grid = X_grid.reshape((len(X_grid), 1))
#plt.scatter(X, Y, color = 'red')
print('size of Y_train = {0}'.format(Y_train.size))
print('size of y_pred = {0}'.format(y_pred.size))
#plt.scatter(Y_train, y_pred, color = 'red')
#plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
#plt.title('GPR')
#plt.xlabel('Measured')
#plt.ylabel('Predicted')
#plt.show()
fig, ax = plt.subplots(1, figsize=(12, 6))
plt.plot(X[:, 0], Y_train, marker='o', color='black', linewidth=0)
plt.plot(X[:, 0], y_pred, marker='x', color='steelblue')
plt.suptitle("$GaussianProcessRegressor(kernel=RBF)$ [default]", fontsize=20)
plt.axis('off')
pass
But I am getting error like: ValueError: x and y must have same first dimension, but have shapes (540,) and (270, 1)
What is the possible solution?
Upvotes: 0
Views: 937
Reputation: 405745
This code splits X
and Y
into training/testing sets, but then tries to plot a column from all of X
with Y_train
and y_pred
, which have only half as many values as X
. Try creating plots with X_train
and X_test
instead.
Upvotes: 1