Reputation: 1289
I have the following data from a tree
analysis:
train = sample(1:nrow(dd),1010)
yhat1 <- predict(tree.model1,newdata=dd[-train,])
v10.test <- dd$v10[-train]
dd
is my data.frame, v10
is the (discrete) response variable that varies between 1 and 10, and train
is a sample drawn from my dataframe.
I want to plot the predictions yhat1
with the actual test values v10.test
, with the point size taking into account the number of actual test.values that are assigned to that yhat1
as prediction.
Thus:
plot(yhat1, v10.test, cex = ???)
The values for cex that I need can be drawn from the table object, but I don't know how. Any ideas?
table(yhat1, dd.test)
v10.test
yhat1 0 1 2 3 4 5 6 7 8 9 10
2.99479166666667 17 26 7 21 10 8 7 7 8 3 6
4.36725663716814 8 15 21 14 14 14 13 12 4 5 4
4.75 1 1 3 1 0 2 2 2 1 1 0
4.82710280373832 6 10 5 11 7 11 11 18 22 3 2
5.73684210526316 1 5 1 9 7 13 10 7 12 7 12
6.68 0 1 0 1 0 3 1 1 0 0 1
6.92045454545455 0 2 3 2 5 5 4 7 6 9 6
Upvotes: 1
Views: 638
Reputation: 49640
The symbols
function may be preferable to using plot
and cex
when you want the size of points to depend on an additional variable. Note that you will generally get the best representation when using the square root of the variable to determine size (so that the area is proportional).
Upvotes: 1
Reputation: 1289
I played around a bit more and it turns out my main problem was not with the table but with the standard settings for pch
and the standard size of the points, which made the resulting graph impossible to interpret.
So a way of doing it simply is
plot(yhat1, dd.test, pch = 20, cex = table(yhat1,v10.test)/10)
That does the trick (and shows how poor the data fit is)
Upvotes: 1