SJDS
SJDS

Reputation: 1289

Plotting points with size adapting to number of data points (cex)

I have the following data from a tree analysis:

train = sample(1:nrow(dd),1010)  
yhat1 <- predict(tree.model1,newdata=dd[-train,])
v10.test <- dd$v10[-train]

dd is my data.frame, v10 is the (discrete) response variable that varies between 1 and 10, and train is a sample drawn from my dataframe.

I want to plot the predictions yhat1 with the actual test values v10.test, with the point size taking into account the number of actual test.values that are assigned to that yhat1 as prediction.

Thus:

plot(yhat1, v10.test, cex = ???)

The values for cex that I need can be drawn from the table object, but I don't know how. Any ideas?

table(yhat1, dd.test)
                 v10.test
yhat1               0  1  2  3  4  5  6  7  8  9 10
  2.99479166666667 17 26  7 21 10  8  7  7  8  3  6
  4.36725663716814  8 15 21 14 14 14 13 12  4  5  4
  4.75              1  1  3  1  0  2  2  2  1  1  0
  4.82710280373832  6 10  5 11  7 11 11 18 22  3  2
  5.73684210526316  1  5  1  9  7 13 10  7 12  7 12
  6.68              0  1  0  1  0  3  1  1  0  0  1
  6.92045454545455  0  2  3  2  5  5  4  7  6  9  6

Upvotes: 1

Views: 638

Answers (2)

Greg Snow
Greg Snow

Reputation: 49640

The symbols function may be preferable to using plot and cex when you want the size of points to depend on an additional variable. Note that you will generally get the best representation when using the square root of the variable to determine size (so that the area is proportional).

Upvotes: 1

SJDS
SJDS

Reputation: 1289

I played around a bit more and it turns out my main problem was not with the table but with the standard settings for pch and the standard size of the points, which made the resulting graph impossible to interpret.

So a way of doing it simply is

plot(yhat1, dd.test, pch = 20, cex = table(yhat1,v10.test)/10)

That does the trick (and shows how poor the data fit is)

Upvotes: 1

Related Questions