LaurentiuMa
LaurentiuMa

Reputation: 55

Plotting two data frames (x and y) that do not have the same dimensionality?

I have recently built an SVR model on the diamonds dataset to predict the price of a diamond, based on some specific features. I was trying to plot the test features of my model against the predicted price. Below is an explanation of the variables used in the code.

X_test - I carried out a train/test split and these are the features used to test the model on. Size 10782,7 (8 total features).

y_pred - After running the model, this will be the predicted price for each row of features in the dataset. Size 10782.

Below is the code for how these come into play

diamonds_features = ['carat', 'x', 'y', 'z', 'color', 'cut', 'clarity']

X = df.loc[:, diamonds_features].values
y = df.iloc[:, 6:7].values

X_train, X_test, y_train, y_test = train_test_split(X, y.ravel(), test_size=0.20)

regressor = SVR(kernel='rbf', C=50, gamma = 10)
regressor.fit(X_train, y_train)

#produce test predictions
y_pred = regressor.predict(X_test)

Below is the code for plotting the outcome of the model.

colorGroup = ['b','g','r','c','m','y','k','w']
plt.figure(1)
for i in range(len(X_test)):
  col = colorGroup[i % 8]
  for j in range(8):
      plt.scatter(X_test[i:i+1, j:j+1], y_pred[i:i+1], color=col)

To go around the fact that X_test and y_pred are of different sizes, I wanted to do the following:

For each individual value in y_pred (since it is a 1d array, it will be every value), Take every value in one row of X_test and plot it against the y_pred value. Moreover, use mod to ensure that every feature is coloured accordingly (e.g. when I am plotting carat, it will be a consistent colour throughout the plot).

The issue I get with this code is that I get the following: "ValueError: x and y must be the same size"

If anyone could point out where I am going wrong with this, I would be grateful.

Here is the Traceback I am receiving:

Traceback (most recent call last):

File "C:\Users\mypackage\SVM Model.py", line 72, in plt.scatter(X_test[i:i+1, j:j+1], y_pred[i:i+1], color=colorGroup[i%8])

File "C:\Users\anaconda3\lib\site-packages\matplotlib\pyplot.py", line 2890, in scatter __ret = gca().scatter(

File "C:\Users\anaconda3\lib\site-packages\matplotlib_init_.py", line 1438, in inner return func(ax, *map(sanitize_sequence, args), **kwargs)

File "C:\Users\anaconda3\lib\site-packages\matplotlib\cbook\deprecation.py", line 411, in wrapper return func(*inner_args, **inner_kwargs)

File "C:\Users\anaconda3\lib\site-packages\matplotlib\axes_axes.py", line 4441, in scatter raise ValueError("x and y must be the same size")

ValueError: x and y must be the same size

Edit: updated the question with the Traceback

Upvotes: 0

Views: 206

Answers (1)

Lior Cohen
Lior Cohen

Reputation: 5735

According to the comment I believe this is what you want.

As an example, I used 20 points with 3 features.

import numpy as np
import matplotlib.pyplot as plt

X_test = np.random.rand(20, 3)
y_pred = np.random.rand(20)
N = y_pred.size

colorGroup = ['b','g','r','c','m','y','k','w']
plt.figure(1)

for i in range(N):
    col = colorGroup[i % N]
    plt.scatter(X_test[:, i], y_pred, color=col)

enter image description here

Upvotes: 1

Related Questions