Plotting all observations but colour them based on different group

Question

I have a sample data given below with sample IDs which are unique, and 3 groups. I need to plot all the observations (rows) in 'df' but color them according to the group IDs ('groupid'). Here's what I have so far:

# sample data creation
samples <- paste0("S",c(1:9))
groupid <- c("group1", "group2", "group3")
foo <- data.frame(Samples = samples, Group = rep(groupid, each = 3))

bar <- data.frame()
for(i in 1:length(samples)){
  ran.data <- rnorm(10, 0.5)
  #colnames <- paste0("w",c(1:length(ran.data)))
  for(j in 1:length(ran.data)){
    bar[i,j] <- ran.data[j]
  }
}
df <- cbind(foo, bar)

# ******************
# creating plot data
plotdf <- as.data.frame(t(df))
cols <- as.character(unlist(plotdf[1,]))
plotdf <- plotdf[-c(1,2),] # removing rows
groupid <- df$Group # var to group by
names(plotdf) <- cols
plotdfrows <- names(df[,3:ncol(df)])
plotdf$rownames <- plotdfrows

# melt and plot
library(reshape2)
library(ggplot2)
melteddf <- melt(plotdf, id.var = "rownames")

final.plot <- ggplot(melteddf, aes(rownames, value, colour = variable, group = groupid)) + geom_point() + #geom_line() +
  scale_y_discrete(breaks=seq(-3, 3, by = 0.5)) + scale_x_discrete() + 
  labs(title = paste("Sample plot"))  #breaks=seq(0, 4, by = 0.5)

print(final.plot)

When I use group = 1, I manage to get the plot but observations are colored differently. But where can I specify the 'groupid' information? Thanks in advance.

Djork · Accepted Answer

In addition to @onlyphantom's answer, there are a few issues with your code.

You have unnecessary manipulation of your df to convert to long format. Notice that your original data frame df has the column group that is lost when you manipulated your data. More so, if you look at the structure of your melted data frame melteddf, your code created character values rather than numeric values:

str(melteddf)
'data.frame':   90 obs. of  3 variables:
$ rownames: chr  "V1" "V2" "V3" "V4" ...
$ variable: Factor w/ 9 levels "S1","S2","S3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value   : chr  " 0.5705084" " 0.62928774" " 2.2150650" " 0.96091621" ...

You need only one line of code to convert to long format, and to preserve your group id's, you can add the Group variable to your id.vars:

melteddf2 <- melt(df, id.vars=c("Samples", "Group"))

str(melteddf2)
'data.frame':   90 obs. of  4 variables:
$ Samples : Factor w/ 9 levels "S1","S2","S3",..: 1 2 3 4 5 6 7 8 9 1 ...
$ Group   : Factor w/ 3 levels "group1","group2",..: 1 1 1 2 2 2 3 3 3 1 ...
$ variable: Factor w/ 10 levels "V1","V2","V3",..: 1 1 1 1 1 1 1 1 1 2 ...
$ value   : num  0.571 0.611 -0.229 1.378 2.669 ...

head(melteddf2)
Samples  Group variable      value
1      S1 group1       V1  0.5705084
2      S2 group1       V1  0.6106827
3      S3 group1       V1 -0.2288912
4      S4 group2       V1  1.3781335
5      S5 group2       V1  2.6689560
6      S6 group2       V1  1.8686023

Finally with respect to your ggplot2 code, your y-axis values are continuous and you should not use scale_y_discrete, while your x-axis is already discrete and scale_x_discrete is not necessary. Use aes(colour=Group) if you want to use Group to define color groups.

ggplot(melteddf2, aes(x=variable, y=value, colour = Group)) + geom_point() +
  scale_y_continuous(breaks=seq(-3, 3, by = 0.5)) + 
  labs(title = paste("Sample plot"))

Plotting all observations but colour them based on different group

Answers (2)

Related Questions