JoshuaA
JoshuaA

Reputation: 279

How can I color data points in R based on a pattern match?

I have a data frame with this form:

             V1 V2                       V3          V4       V5         V6       V7           V8
1 0610007C21Rik  -   chr5:31351012-31356737 1.33732e-05 0.752381  0.9965090 0.000000 1.777419e-05
2 0610007L01Rik  - chr5:130695613-130717165 1.67168e+00 1.673120  0.0000000 3.453930 4.997847e-01
3 0610007P08Rik  -  chr13:63916627-64000808 7.06033e-01 0.000000  0.0815767 0.318051 1.000000e+00
4 0610007P14Rik  -  chr12:87157066-87165495 0.00000e+00 0.000000  0.0000000 5.494230          NaN
5 0610007P22Rik  -  chr17:25377114-25379603 4.99696e+00 0.908254  0.9076130 3.639250 8.461946e-01
6 0610009B22Rik  -  chr11:51499151-51502136 6.53363e-01 8.500980 13.5797000 0.000000 7.137192e-02

I am plotting log2(V4) vs. log2(V5) with this command:

plot(log2(df[,4]) ~ log2(df[,5]), xlim=c(0,10), ylim=c(0,10))

I want to color points based on a pattern match in V1. For instance, how can I color 0610007C21Rik and 0610007L01Rik green and 0610007P22Rik and 0610007P14Rik red? I have tried adding another column to the data frame with a color specified, but there's got to be an easier way.

Upvotes: 2

Views: 644

Answers (2)

thelatemail
thelatemail

Reputation: 93813

Here's a base R solution:

Define your list of colours as a named vector once for each unique value of df$V1. Note the ""'s around each of the names of points to be coloured.

col.list <- c(
              "0610007C21Rik"="green",
              "0610007L01Rik"="green",
              "0610007P22Rik"="red",
              "0610007P14Rik"="red"
             )

Then plot away using df$V1 to look up the values in the col.list vector you have just defined.

plot(
     log2(df[,4]) ~ log2(df[,5]), 
     xlim=c(0,10),
     ylim=c(0,10),
     col=col.list[paste(df$V1)]
    )

To address the OP's comment below, use this in the plot call:

... col=ifelse(df$V1 %in% names(col.list),col.list[paste(df$V1)],"black")

This makes the full call look like:

plot( 
      log2(df[,4]) ~ log2(df[,5]),
      xlim=c(0,10),
      ylim=c(0,10),
      col=ifelse(df$V1 %in% names(col.list),col.list[paste(df$V1)],"black")
    )

Upvotes: 1

John
John

Reputation: 43209

Have a look at the ggplot2 package.

If you dput your data frame it will make it easier for people to help with code.

Here is one example with made up data that looks a bit like yours, there are better ways to log transform however.

df <- data.frame(sample(LETTERS[1:5],20, replace=TRUE), abs(rnorm(20)/100), abs(runif(20)*10))
colnames(df) <- c('V1','V4','V5')


library(ggplot2)

p <- ggplot(df, aes(log2(V4) , log2(V5)))
p + geom_point(aes(colour = V1))

Upvotes: 1

Related Questions