How can I color data points in R based on a pattern match?

Question

I have a data frame with this form:

             V1 V2                       V3          V4       V5         V6       V7           V8
1 0610007C21Rik  -   chr5:31351012-31356737 1.33732e-05 0.752381  0.9965090 0.000000 1.777419e-05
2 0610007L01Rik  - chr5:130695613-130717165 1.67168e+00 1.673120  0.0000000 3.453930 4.997847e-01
3 0610007P08Rik  -  chr13:63916627-64000808 7.06033e-01 0.000000  0.0815767 0.318051 1.000000e+00
4 0610007P14Rik  -  chr12:87157066-87165495 0.00000e+00 0.000000  0.0000000 5.494230          NaN
5 0610007P22Rik  -  chr17:25377114-25379603 4.99696e+00 0.908254  0.9076130 3.639250 8.461946e-01
6 0610009B22Rik  -  chr11:51499151-51502136 6.53363e-01 8.500980 13.5797000 0.000000 7.137192e-02

I am plotting log2(V4) vs. log2(V5) with this command:

plot(log2(df[,4]) ~ log2(df[,5]), xlim=c(0,10), ylim=c(0,10))

I want to color points based on a pattern match in V1. For instance, how can I color 0610007C21Rik and 0610007L01Rik green and 0610007P22Rik and 0610007P14Rik red? I have tried adding another column to the data frame with a color specified, but there's got to be an easier way.

thelatemail · Accepted Answer

Here's a base R solution:

Define your list of colours as a named vector once for each unique value of df$V1. Note the ""'s around each of the names of points to be coloured.

col.list <- c(
              "0610007C21Rik"="green",
              "0610007L01Rik"="green",
              "0610007P22Rik"="red",
              "0610007P14Rik"="red"
             )

Then plot away using df$V1 to look up the values in the col.list vector you have just defined.

plot(
     log2(df[,4]) ~ log2(df[,5]), 
     xlim=c(0,10),
     ylim=c(0,10),
     col=col.list[paste(df$V1)]
    )

To address the OP's comment below, use this in the plot call:

... col=ifelse(df$V1 %in% names(col.list),col.list[paste(df$V1)],"black")

This makes the full call look like:

plot( 
      log2(df[,4]) ~ log2(df[,5]),
      xlim=c(0,10),
      ylim=c(0,10),
      col=ifelse(df$V1 %in% names(col.list),col.list[paste(df$V1)],"black")
    )

How can I color data points in R based on a pattern match?

Answers (2)

Related Questions