Reputation: 11

How to plot a graph of actual vs predict values in

          Original      Predicted
0           6            1.56
1           12.2         3.07
2           0.8          2.78
3           5.2          3.54
.

Code that I have tried:

def plotGraph(y_test,y_pred,regressorName):
    if max(y_test) >= max(y_pred):
        my_range = int(max(y_test))
    else:
        my_range = int(max(y_pred))
    plt.scatter(y_test, y_pred, color='red')
    plt.plot(range(my_range), range(my_range), 'o')
    plt.title(regressorName)
    plt.show()
    return

Graph that I just wanted:

But my current output:

Upvotes: 0

Answers (4)

curiousBrain

Reputation: 49

You are plotting the y_test on x axis and y_pred on y axis. And what you want to have is a common data point on x axis and y_test and y_pred both on Y axis. And following snippet below will help you achieve that. (where true_value and predicted_value are your lists to be plotted and common is your list from your dataframe used as common x axis.)

    fig = plt.figure()
    a1 = fig.add_axes([0,0,1,1])
    x = common
    a1.plot(x,true_value, 'ro')
    a1.set_ylabel('Actual')
    a2 = a1.twinx()
    a2.plot(x, predicted_value,'o')
    a2.set_ylabel('Predicted')
    fig.legend(labels = ('Actual','Predicted'),loc='upper left')
    plt.show()

Upvotes: 1

Roland Deschain

Reputation: 2830

The problem you seem to have is that you mix y_test and y_pred into one "plot" (meaning here the scatter() function)

Using scatter() or plot() function (which you also mixed up), the first parameter are the coordinates on the x-axis and the second parameter are the coordinates on the y-axis.

So 1.) you need to one scatter() with only y_test and then one with only y_pred. To do this you 2.) need either to have 2D data, or as it seems to be in your case, just use indexes for the x-axis by using the range() functionality.

Here is some code with random data, that might get you started:

import matplotlib.pyplot as plt
import numpy as np


def plotGraph(y_test,y_pred,regressorName):
    if max(y_test) >= max(y_pred):
        my_range = int(max(y_test))
    else:
        my_range = int(max(y_pred))
    plt.scatter(range(len(y_test)), y_test, color='blue')
    plt.scatter(range(len(y_pred)), y_pred, color='red')
    plt.title(regressorName)
    plt.show()
    return


y_test = range(10)
y_pred = np.random.randint(0, 10, 10)

plotGraph(y_test, y_pred, "test")

This will give you something like this:

Upvotes: 3

J.Lewandowski

Reputation: 95

In matplotlib (from the code I assume you're using it) documentation there is an information for matplotlib.pyplot.scatter function, first two parameters are:

x, y : float or array-like, shape (n, )
The data positions.

So for your application you need to draw two scatterplots on the same graph - using matplotlib.pyplot.scatter twice. First with y_test as y and color='red', second with y_pred as y and color='blue'

Unfortunately, you didn't provide information what you x values for y_test and y_pred, but you will need those as well to define x in your plt.scatter function calls.

Drawing two scatter plots is slightly tricky, and as this answer says it requires a reference to an Axes object. For example (as given in the answer):

import matplotlib.pyplot as plt

x = range(100)
y = range(100,200)
fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(x[:4], y[:4], s=10, c='b', marker="s", label='first')
ax1.scatter(x[40:],y[40:], s=10, c='r', marker="o", label='second')

Take a look at the matplotlib documentation and the mentioned answer for more details.

Upvotes: 0

Nott

Reputation: 313

I can not simulate your code, but I have seen some points at the first glance. First of all, data points in the graph you wanted are normalized. You need to divide the all data points in col by the maximum value of this col.

You should also check the legend function in the documentation to add legend like the graph you wanted.

Upvotes: 0

How to plot a graph of actual vs predict values in

Answers (4)

Related Questions