Reputation: 628
I would like to plot the difference between each individual point.
I have one series y_test
which is one-dimensional and contains continuous values. The index is kinda whacky (7618, 276, 7045, 6095, 2296, 7191, 1213, 2408...
).
And I have another numpy array ypred which is one-dimensional and contains the prediction of y_test
. I would like to see the difference of each value predicted using a graph.
I tried this:
fig, ax1 = plt.subplots(figsize = (20,5))
ax1.bar(y_test, y_test.index color = 'tab:orange')
ax1.set_ylabel('Actual',color = 'tab:orange')
ax2 = ax1.twinx()
ax2.bar(y_pred, y_test.index, color = 'tab:blue')
ax2.set_ylabel('Predicted',color = 'tab:blue')
plt.title('XGBoost Regression Performance')
fig.tight_layout()
plt.show()
but it returns error:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
bar/scatter/anything is fine I just wanted to take a look at all the values together.
This is so that I can group the best predicted values to understand which feature values within my original data are easiest to predict with.
If, incidentally, anyone could recommend the best XGBoost way of getting that information let me know too.
Here is some data:
ypred:
[10.410029 , 4.4897604, 29.77089 , 23.548471 , 27.415161 ,
56.28772 , 13.083108 , 38.086662 , 19.128792 , 42.49037 ,
65.15919 , 47.172436 , 39.517883 , 13.782948 , 121.52351 ,
8.388838 , 49.625607 , 24.28464 , 49.55232 , 34.797436]
y_test:
7618 9.88
276 2.69
7045 26.93
6095 23.49
2296 24.79
7191 57.09
1213 15.90
2408 46.26
5961 18.60
275 41.03
1707 66.25
2333 53.50
5717 40.60
1497 12.34
4937 121.93
2654 7.97
7442 53.65
7157 25.93
2141 54.28
4339 36.93
Thank you
Upvotes: 0
Views: 5740
Reputation: 12524
I assume y_test
has a 'val'
column, where the values you want to plot are stored.
Maybe this could be helpful?
You have the index on x axis, and predicted and true values on y axes.
fig, ax1 = plt.subplots(figsize = (20,5))
ax1.plot(y_test.index, y_test['val'], color = 'tab:orange')
ax1.set_ylabel('Actual',color = 'tab:orange')
ax2 = ax1.twinx()
ax2.plot(y_test.index, y_pred, color = 'tab:blue')
ax2.set_ylabel('Predicted',color = 'tab:blue')
plt.title('XGBoost Regression Performance')
fig.tight_layout()
plt.show()
Upvotes: 1
Reputation: 262484
plt.scatter(y_test, y_pred)
?
Many points close to the equality line (diagonal) means good predictions, far away means not so good.
Upvotes: 2