Reputation: 441
I'm doing some model comparisons across 3 different modeling methods based on their cross-validation error rates. I'm creating a plot for a report to show the distinction of error rates across the 3 modeling methods. Is there a way I can change the colors of the points for each modeling method to correspond with my legend? All of my values are being stored in a single variable.
Input data:
[1] 0.3121693 0.3174603 0.3121693 0.3068783 0.2592593 0.3015873 0.3068783 0.3068783 0.3121693 0.3386243 0.3650794 0.3227513 0.3174603 0.3333333 0.3492063 0.3492063 0.3121693 0.3174603 0.3121693
[20] 0.3015873 0.2751323 0.3015873 0.3015873 0.3068783
So:
Models 1-8 = Red
Models 9-16 = Blue
Models 17-24 = Green
Current Code:
plot(allcv10,pch=20,xlab="Model Number",ylab="CV Error Rate",main="Comparison of Error Rates");abline(v=c(8.5,16.5));legend("topright", legend=c("LDA", "QDA", "Logistic Regression"),
col=c("red", "blue","green"), pch=20, cex=0.8)
Upvotes: 0
Views: 1479
Reputation: 18683
If the data changes, or if more models are added or some get deleted, then you don't want want to have to change the commands for creating the graph. The commands for graphing should work regardless of any changes to the data.
Suppose this is the data.
df_cv
index allcv10 Model
1 1 0.3121693 1
2 2 0.3174603 1
3 3 0.3121693 1
4 4 0.3068783 1
5 5 0.2592593 1
6 6 0.3015873 1
7 7 0.3068783 1
8 8 0.3068783 1
9 9 0.3121693 2
10 10 0.3386243 2
11 11 0.3650794 2
12 12 0.3227513 2
13 13 0.3174603 2
14 14 0.3333333 2
15 15 0.3492063 2
16 16 0.3492063 2
17 17 0.3121693 3
18 18 0.3174603 3
19 19 0.3121693 3
20 20 0.3015873 3
21 21 0.2751323 3
22 22 0.3015873 3
23 23 0.3015873 3
24 24 0.3068783 3
The colors for the three models should be specified as a vector independently of the data.
cols <- c("red","green","blue")
This will facilitate creating a legend as well.
plot(allcv10~index, data=df_cv, xlab="Model Number", ylab="CV Error Rate",
main="Comparison of Error Rates", pch=20,
col = cols[Model]) # use Model to index the color vector
legend("topright", legend=c("LDA", "QDA", "Logistic Regression"),
col = cols, pch=20, cex=0.8)
If you want to change the colors, you only need to change the cols
vector, not the data. And if a fourth Modelling type (Model=4
) enters the fray, then the cols
vector can simply be changed by adding another color. The plotting commands, including the legend, don't need to be changed.
This is the way ggplot works. The color is specified using a variable of the data, and a color vector the same length as the number of levels of the variable, not by specifying an independent color vector the same length as the variable.
Upvotes: 1
Reputation: 1763
I advise you use a data.frame structure which allow more control over the plot parameters :
allcv10 <- c(0.3121693,
0.3174603, 0.3121693, 0.3068783, 0.2592593, 0.3015873,
0.3068783, 0.3068783, 0.3121693, 0.3386243, 0.3650794,
0.3227513, 0.3174603, 0.3333333, 0.3492063, 0.3492063,
0.3121693, 0.3174603, 0.3121693, 0.3015873, 0.2751323,
0.3015873, 0.3015873, 0.3068783
)
colors <- c(rep("red", 8), rep("blue", 8), rep("green", 8))
# length(allcv10) ; length(colors) ;
df_cv <- data.frame(index = 1:length(allcv10), allcv10, colors, stringsAsFactors = FALSE)
plot(x = df_cv$index, y = df_cv$allcv10, pch=20, xlab="Model Number",ylab="CV Error Rate",
main="Comparison of Error Rates",
col = df_cv$colors
);
abline(v=c(8.5,16.5));
legend("topright", legend=c("LDA", "QDA", "Logistic Regression"),
col=c("red", "blue","green"), pch=20, cex=0.8)
Upvotes: 2
Reputation: 173803
You can add points of any colour using points
:
allcv10 <- c(0.3121693, 0.3174603, 0.3121693, 0.3068783, 0.2592593, 0.3015873,
0.3068783, 0.3068783, 0.3121693, 0.3386243, 0.3650794, 0.3227513,
0.3174603, 0.3333333, 0.3492063, 0.3492063, 0.3121693, 0.3174603,
0.3121693, 0.3015873, 0.2751323, 0.3015873, 0.3015873, 0.3068783)
plot(allcv10[1:8], pch = 20, xlab = "Model Number", ylab = "CV Error Rate",
main = "Comparison of Error Rates", xlim = c(1, 24), ylim = c(0.25, 0.37),
col ="red")
points(9:16, allcv10[9:16], col = "blue", pch = 20)
points(17:24, allcv10[17:24], col = "green", pch = 20)
abline(v = c(8.5, 16.5))
legend("topright", legend = c("LDA", "QDA", "Logistic Regression"),
col = c("red", "blue", "green"), pch = 20, cex = 0.8)
Created on 2020-02-28 by the reprex package (v0.3.0)
Upvotes: 2