Sabor117
Sabor117

Reputation: 135

Colouring specific points on a scatterplot with ggplot2

I have been scratching my head at this for ages and I cannot figure out for the life of me what I'm doing wrong.

I'm aware this is very similar to a few other question, most notably: How to plot specific colors and shapes for ggplot2 scatter plot? but the problem is that it's by following the answer in that question that I've arrived at my current problem and have no idea what's gone wrong.

So, here is my data:

comb_frame <- structure(list(decode_beta = c("0.00279501", "-0.0098421", "-0.025254", 
                                             "0.00172701", "0.00531102", "0.000274217", "0.00594772859800487", 
                                             "0.000376995", "0.00082946", "0.00357124647463984", "-0.0018971", 
                                             "0.0083565", "0.00356544", "-0.000609096", "0.00167749", "-0.0150423", 
                                             "-0.022448", "-0.00242648", "-0.00190033", "-0.022692", "0.00536424", 
                                             "-0.00100278", "0.0073661", "0.00092082", "-0.00263694", "0.0076137", 
                                             "0.0072423", "-0.00081708", "-0.01708", "0.00211079", "0.0011098", 
                                             "-0.000107087", "0.0022284", "0.00068709", "-0.00562316159145804", 
                                             "0.00112658", "0.00207365", "-0.000287835", "-0.00286597", "-0.027999", 
                                             "0.00503866", "0.00305786", "-0.001238", "0.0071804", "-0.0084529", 
                                             "0.00556481", "-1.9459e-05", "0.000191271", "-0.017995", "0.002799", 
                                             "-0.024888", "-0.008418", "0.02257", "-0.008174", "-0.019886", 
                                             "-0.00492105", "0.00362115", "0.00392446", "0.00281645"), scallop_beta = c(-0.01011621546, 
                                                                                                                        0.0047657725, -0.02134944, -0.0016247829, 0.0044858415, -0.0015072187, 
                                                                                                                        -0.00782423635, -0.0013813875, -0.001077867, 0.02124057075, 0.0019690364, 
                                                                                                                        -0.004913727, 0.00098559246, 0.00302699872, -0.000395703, -0.02609645934, 
                                                                                                                        -0.02794527222, 0.000946532, 0.000786876, -0.00685633312, -0.004700096, 
                                                                                                                        0.00198448425, 0.00497280424, -0.00480984096, -0.00251334656, 
                                                                                                                        8.4434e-05, 0.00185996837, 0.001175848, -0.01947989552, -0.001227005, 
                                                                                                                        -0.0038851968, -0.00650484, -0.00262378296, 0.003949936, 0.0113079946, 
                                                                                                                        -0.00216854672, -0.000730496, 0.001289556, 0.004527388, -0.01095271456, 
                                                                                                                        0.00580293467, 0.00515290737, 0.000929589, -0.00292289712, 0.0053226888, 
                                                                                                                        -3.969984e-05, -0.0115784, 0.0030260514, -0.00695347872, 0.0092864585, 
                                                                                                                        -0.01863179184, 7.274624e-05, 0.00208976, 0.00042348704, -0.00965808, 
                                                                                                                        -0.0048684602, 0.0045743228, 0.00489489, -0.002105883), significance = c("SCAL SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "NON SIG", "DEC SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "SCAL SIG", "NON SIG", "DEC SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "NON SIG", "DEC SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "DEC SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "SCAL SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "SCAL SIG", "NON SIG", "DEC SIG", "SCAL SIG", "NON SIG", 
                                                                                                                                                                                                 "DEC SIG", "NON SIG", "DEC SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "SCAL SIG", "SCAL SIG", "NON SIG")), row.names = c(NA, 
                                                                                                                                                                                                                                                               -59L), class = "data.frame")

I am trying to create a scatterplot of the two sets of betas and then colour them by their respective significance in two separate data sets (defined by the third column).

Based on the question I shared I do this:

comb_frame$significance = factor(comb_frame$significance, levels = (unique(comb_frame$significance))) ### First I changed significance into a factor

frame_colours = ifelse(comb_frame$significance == "DEC SIG", "#FF0000", ifelse(comb_frame$significance == "SCAL SIG", "#00A08A", "Gray")) ### I make a vector of the three colours I want

### Then I plot my graph as follows:

ggplot(comb_frame, aes(x = decode_beta, y = scallop_beta)) +
    theme_classic() +
    labs(x = "DeCODE beta (adjusted)",
         y = "SCALLOP beta (adjusted)",
         title = paste0("Proteomics PheWAS correlations ", curr_path)) +
    geom_abline(intercept = 0) +
    geom_smooth(method = "lm", se = FALSE, colour = "red") +
    geom_point(aes(colour = significance)) + 
    scale_color_manual(breaks = unique(comb_frame$significance), values = frame_colours)

This very ALMOST works and produces the following:

Plot output

But as you can see, it is only colouring some of the points. It's colouring those points correctly, but it's then not adding the third colour for some reason and I cannot figure out what's gone wrong.

I have also tried doing this with the significance column not a factor with the same results.

Upvotes: 1

Views: 815

Answers (1)

chemdork123
chemdork123

Reputation: 13823

OP, the vector created for the values= argument in scale_color_manual() is used to map those values (color names) against the variations in what is defined for color= (i.e. combi_frame$significance). There are only 3 levels in combi_frame$significance ("SCAL SIG", "NON SIG", and "DEC SIG"), yet frame_colours is a vector with 59 values. Consequently, the first three values of that vector are mapped to the three levels in the order of the levels themselves.

The first 3 values in frame_colours are:

"#00A08A" "Gray"    "Gray" 

So that's why you see that green color (#00A08A) and the others look gray. What you want to do is set values= equal to a vector that can map the colors directly to each level. I find it's easiest to do this via a named vector. try replacing your line frame_colours = ifelse(... ) line with this:

frame_colours = c(
  "DEC SIG"="#FF0000",
  "SCAL SIG"= "#00A08A",
  "NON SIG"="Gray")

Running the plot code then gives you this:

enter image description here

You don't have to supply a named vector, but you must specify in values a vector that has at least as many items as there are levels in what is mapped to the color aesthetic.

Upvotes: 2

Related Questions