peakstatus
peakstatus

Reputation: 441

R - Color plot points by index

I'm doing some model comparisons across 3 different modeling methods based on their cross-validation error rates. I'm creating a plot for a report to show the distinction of error rates across the 3 modeling methods. Is there a way I can change the colors of the points for each modeling method to correspond with my legend? All of my values are being stored in a single variable.

Input data:

 [1] 0.3121693 0.3174603 0.3121693 0.3068783 0.2592593 0.3015873 0.3068783 0.3068783 0.3121693 0.3386243 0.3650794 0.3227513 0.3174603 0.3333333 0.3492063 0.3492063 0.3121693 0.3174603 0.3121693
[20] 0.3015873 0.2751323 0.3015873 0.3015873 0.3068783

So:

Models 1-8 = Red

Models 9-16 = Blue

Models 17-24 = Green

Current Code:

plot(allcv10,pch=20,xlab="Model Number",ylab="CV Error Rate",main="Comparison of Error Rates");abline(v=c(8.5,16.5));legend("topright", legend=c("LDA", "QDA", "Logistic Regression"),
       col=c("red", "blue","green"), pch=20, cex=0.8)

Current Plot: enter image description here

Upvotes: 0

Views: 1479

Answers (3)

Edward
Edward

Reputation: 18683

If the data changes, or if more models are added or some get deleted, then you don't want want to have to change the commands for creating the graph. The commands for graphing should work regardless of any changes to the data.

Suppose this is the data.

df_cv
   index   allcv10 Model
1      1 0.3121693     1
2      2 0.3174603     1
3      3 0.3121693     1
4      4 0.3068783     1
5      5 0.2592593     1
6      6 0.3015873     1
7      7 0.3068783     1
8      8 0.3068783     1
9      9 0.3121693     2
10    10 0.3386243     2
11    11 0.3650794     2
12    12 0.3227513     2
13    13 0.3174603     2
14    14 0.3333333     2
15    15 0.3492063     2
16    16 0.3492063     2
17    17 0.3121693     3
18    18 0.3174603     3
19    19 0.3121693     3
20    20 0.3015873     3
21    21 0.2751323     3
22    22 0.3015873     3
23    23 0.3015873     3
24    24 0.3068783     3

The colors for the three models should be specified as a vector independently of the data.

cols <- c("red","green","blue")

This will facilitate creating a legend as well.

plot(allcv10~index, data=df_cv, xlab="Model Number", ylab="CV Error Rate",
     main="Comparison of Error Rates", pch=20, 
     col = cols[Model]) # use Model to index the color vector

legend("topright", legend=c("LDA", "QDA", "Logistic Regression"), 
     col = cols, pch=20, cex=0.8)

If you want to change the colors, you only need to change the cols vector, not the data. And if a fourth Modelling type (Model=4) enters the fray, then the cols vector can simply be changed by adding another color. The plotting commands, including the legend, don't need to be changed.

This is the way ggplot works. The color is specified using a variable of the data, and a color vector the same length as the number of levels of the variable, not by specifying an independent color vector the same length as the variable.

Upvotes: 1

cbo
cbo

Reputation: 1763

I advise you use a data.frame structure which allow more control over the plot parameters :

allcv10 <- c(0.3121693,
        0.3174603, 0.3121693, 0.3068783, 0.2592593, 0.3015873,
        0.3068783, 0.3068783, 0.3121693, 0.3386243, 0.3650794,
        0.3227513, 0.3174603, 0.3333333, 0.3492063, 0.3492063,
        0.3121693, 0.3174603, 0.3121693, 0.3015873, 0.2751323,
        0.3015873, 0.3015873, 0.3068783
)

colors <- c(rep("red", 8), rep("blue", 8), rep("green", 8))

# length(allcv10) ; length(colors) ;
df_cv <- data.frame(index = 1:length(allcv10), allcv10, colors, stringsAsFactors = FALSE)

plot(x = df_cv$index, y = df_cv$allcv10, pch=20, xlab="Model Number",ylab="CV Error Rate",
     main="Comparison of Error Rates",
     col = df_cv$colors
     );
abline(v=c(8.5,16.5));
legend("topright", legend=c("LDA", "QDA", "Logistic Regression"),
       col=c("red", "blue","green"), pch=20, cex=0.8)

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173803

You can add points of any colour using points:

allcv10 <- c(0.3121693, 0.3174603, 0.3121693, 0.3068783, 0.2592593, 0.3015873, 
             0.3068783, 0.3068783, 0.3121693, 0.3386243, 0.3650794, 0.3227513, 
             0.3174603, 0.3333333, 0.3492063, 0.3492063, 0.3121693, 0.3174603,
             0.3121693, 0.3015873, 0.2751323, 0.3015873, 0.3015873, 0.3068783)

plot(allcv10[1:8], pch = 20, xlab = "Model Number", ylab = "CV Error Rate",
     main = "Comparison of Error Rates", xlim = c(1, 24), ylim = c(0.25, 0.37),
     col ="red")
points(9:16, allcv10[9:16], col = "blue", pch = 20)
points(17:24, allcv10[17:24], col = "green", pch = 20)
abline(v = c(8.5, 16.5))
legend("topright", legend = c("LDA", "QDA", "Logistic Regression"),
       col = c("red", "blue", "green"), pch = 20, cex = 0.8)

Created on 2020-02-28 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions