Vinay
Vinay

Reputation: 477

Color data points based on sample classification

A pairwise scatterplot showing relationship between genes (columns in data frame) across multiple samples (rows in data frame) is created. The samples belong to two distinct groups: group "A" and "B". Since one dot in plot represent one sample, I need to color the data points (dots) according to groups with two different colors, say group A with "green" and group B with "red". Is it possible to do that?

Any kind of help will be appreciated.

plot(DF[1:6], pch = 21) #command used for plotting, DF is data frame

enter image description here

Sample Data Frame Example:

       CBX3     PSPH     ATP2C1    SNX10     MMD      ATP13A3
B     10.589844 6.842970 8.084550  8.475023  9.202490 10.403811
A     10.174385 5.517944 7.736994  9.094834  9.253766 10.133408
B     10.202084 5.669137 7.392141  7.522270  7.830969  9.123178
B     10.893231 6.630709 7.601690  7.894177  8.979142  9.791841
B     10.071038 5.091222 7.032585  8.305581  7.903737  8.994821
A     10.005002 4.708631 7.927246  7.292527  8.257853  10.054630
B     10.028055 5.080944 6.421961  7.616856  8.287496  9.642294
A     10.144115 6.626483 7.686203  7.970934  7.919615  9.475175
A     10.675386 6.874047 7.900560  7.605519  8.585158  8.858613
A     9.855063  5.164399 6.847923  8.072608  8.221344  9.077744
A     10.994228 6.545318 8.606128  8.426329  8.787876  9.857079
A     10.501266 6.677360 7.787168  8.444976  8.928174  9.542558

Upvotes: 0

Views: 268

Answers (3)

jraab
jraab

Reputation: 413

GGally has a good function for this as well.

library(GGally)
ggpairs(dd, color = 'CLASS',columns = 2:ncol(dd) )

enter image description here

Upvotes: 2

Jthorpe
Jthorpe

Reputation: 10167

You can add color to the points by specifying the argument col to plot

DF <-  read.delim(textConnection(
"category   CBX3    PSPH    ATP2C1  SNX10   MMD ATP13A3
B   10.589844   6.842970    8.084550    8.475023    9.202490    10.403811
A   10.174385   5.517944    7.736994    9.094834    9.253766    10.133408
B   10.202084   5.669137    7.392141    7.522270    7.830969    9.123178
B   10.893231   6.630709    7.601690    7.894177    8.979142    9.791841
B   10.071038   5.091222    7.032585    8.305581    7.903737    8.994821
A   10.005002   4.708631    7.927246    7.292527    8.257853    10.054630
B   10.028055   5.080944    6.421961    7.616856    8.287496    9.642294
A   10.144115   6.626483    7.686203    7.970934    7.919615    9.475175
A   10.675386   6.874047    7.900560    7.605519    8.585158    8.858613
A   9.855063    5.164399    6.847923    8.072608    8.221344    9.077744
A   10.994228   6.545318    8.606128    8.426329    8.787876    9.857079
A   10.501266   6.677360    7.787168    8.444976    8.928174    9.542558"))

plot(DF[2:7],col = ifelse(DF$category == 'A','red','green'))

A list of valid color values can be obtained by calling colors(). Vectors with a gradient of colors can be created via rainbow(), and just for fun, I use this little function for choosing pretty colors when making a figure.

(Edited per suggestions from @MrFlick)

#! @param n The number of colors to be selected
colorchoose <- function (n = 1, alpha, term = F) 
{
    cols <- colors()
    mod <- ceiling(sqrt(length(cols)))
    plot(xlab = "", ylab = "", main = "click for color name", 
        c(0, mod), c(0, mod), type = "n", axes = F)
    s<-seq_along(cols)
    dev.hold()
    points(s%%mod, s%/%mod, col = cols, pch = 15, cex = 2.4)
    dev.flush()
    p <- locator(n)
    return(cols[round(p$y) * mod + round(p$x)])
}

Upvotes: 1

MrFlick
MrFlick

Reputation: 206197

It might not be that easy to do with base graphics. You could easily do this with lattice. With this sample data.frame

dd<-structure(list(CLASS = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 2L, 
1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"), 
    CBX3 = c(10.589844, 10.174385, 10.202084, 10.893231, 10.071038, 
    10.005002, 10.028055, 10.144115, 10.675386, 9.855063, 10.994228, 
    10.501266), PSPH = c(6.84297, 5.517944, 5.669137, 6.630709, 
    5.091222, 4.708631, 5.080944, 6.626483, 6.874047, 5.164399, 
    6.545318, 6.67736), ATP2C1 = c(8.08455, 7.736994, 7.392141, 
    7.60169, 7.032585, 7.927246, 6.421961, 7.686203, 7.90056, 
    6.847923, 8.606128, 7.787168), SNX10 = c(8.475023, 9.094834, 
    7.52227, 7.894177, 8.305581, 7.292527, 7.616856, 7.970934, 
    7.605519, 8.072608, 8.426329, 8.444976), MMD = c(9.20249, 
    9.253766, 7.830969, 8.979142, 7.903737, 8.257853, 8.287496, 
    7.919615, 8.585158, 8.221344, 8.787876, 8.928174), ATP13A3 = c(10.403811, 
    10.133408, 9.123178, 9.791841, 8.994821, 10.05463, 9.642294, 
    9.475175, 8.858613, 9.077744, 9.857079, 9.542558)), .Names = c("CLASS", 
"CBX3", "PSPH", "ATP2C1", "SNX10", "MMD", "ATP13A3"), class = "data.frame", row.names = c(NA, -12L))

you can do

library(lattice)
splom(~dd[,-1], groups=dd$CLASS)

to get

enter image description here

Upvotes: 1

Related Questions