Bigboss01
Bigboss01

Reputation: 628

Python plot 1D array

I would like to plot the difference between each individual point.

I have one series y_test which is one-dimensional and contains continuous values. The index is kinda whacky (7618, 276, 7045, 6095, 2296, 7191, 1213, 2408...).

And I have another numpy array ypred which is one-dimensional and contains the prediction of y_test. I would like to see the difference of each value predicted using a graph.

I tried this:

fig, ax1 = plt.subplots(figsize = (20,5))
ax1.bar(y_test, y_test.index color = 'tab:orange')
ax1.set_ylabel('Actual',color = 'tab:orange')
ax2 = ax1.twinx()
ax2.bar(y_pred, y_test.index, color = 'tab:blue')
ax2.set_ylabel('Predicted',color = 'tab:blue')
plt.title('XGBoost Regression Performance')
fig.tight_layout()
plt.show()

but it returns error:

ValueError: shape mismatch: objects cannot be broadcast to a single shape

bar/scatter/anything is fine I just wanted to take a look at all the values together.

This is so that I can group the best predicted values to understand which feature values within my original data are easiest to predict with.

If, incidentally, anyone could recommend the best XGBoost way of getting that information let me know too.

Here is some data:

ypred: 
[10.410029 ,   4.4897604,  29.77089  ,  23.548471 ,  27.415161 ,
        56.28772  ,  13.083108 ,  38.086662 ,  19.128792 ,  42.49037  ,
        65.15919  ,  47.172436 ,  39.517883 ,  13.782948 , 121.52351  ,
         8.388838 ,  49.625607 ,  24.28464  ,  49.55232  ,  34.797436] 

y_test:
7618      9.88
276       2.69
7045     26.93
6095     23.49
2296     24.79
7191     57.09
1213     15.90
2408     46.26
5961     18.60
275      41.03
1707     66.25
2333     53.50
5717     40.60
1497     12.34
4937    121.93
2654      7.97
7442     53.65
7157     25.93
2141     54.28
4339     36.93

Thank you

Upvotes: 0

Views: 5740

Answers (2)

Zephyr
Zephyr

Reputation: 12524

I assume y_test has a 'val' column, where the values you want to plot are stored.
Maybe this could be helpful?
You have the index on x axis, and predicted and true values on y axes.

fig, ax1 = plt.subplots(figsize = (20,5))

ax1.plot(y_test.index, y_test['val'], color = 'tab:orange')
ax1.set_ylabel('Actual',color = 'tab:orange')
ax2 = ax1.twinx()
ax2.plot(y_test.index, y_pred, color = 'tab:blue')
ax2.set_ylabel('Predicted',color = 'tab:blue')
plt.title('XGBoost Regression Performance')
fig.tight_layout()

plt.show()

enter image description here

Upvotes: 1

mozway
mozway

Reputation: 262484

plt.scatter(y_test, y_pred)?

Many points close to the equality line (diagonal) means good predictions, far away means not so good.

Upvotes: 2

Related Questions