snair.stack
snair.stack

Reputation: 415

Plotting all observations but colour them based on different group

I have a sample data given below with sample IDs which are unique, and 3 groups. I need to plot all the observations (rows) in 'df' but color them according to the group IDs ('groupid'). Here's what I have so far:

# sample data creation
samples <- paste0("S",c(1:9))
groupid <- c("group1", "group2", "group3")
foo <- data.frame(Samples = samples, Group = rep(groupid, each = 3))

bar <- data.frame()
for(i in 1:length(samples)){
  ran.data <- rnorm(10, 0.5)
  #colnames <- paste0("w",c(1:length(ran.data)))
  for(j in 1:length(ran.data)){
    bar[i,j] <- ran.data[j]
  }
}
df <- cbind(foo, bar)

# ******************
# creating plot data
plotdf <- as.data.frame(t(df))
cols <- as.character(unlist(plotdf[1,]))
plotdf <- plotdf[-c(1,2),] # removing rows
groupid <- df$Group # var to group by
names(plotdf) <- cols
plotdfrows <- names(df[,3:ncol(df)])
plotdf$rownames <- plotdfrows

# melt and plot
library(reshape2)
library(ggplot2)
melteddf <- melt(plotdf, id.var = "rownames")

final.plot <- ggplot(melteddf, aes(rownames, value, colour = variable, group = groupid)) + geom_point() + #geom_line() +
  scale_y_discrete(breaks=seq(-3, 3, by = 0.5)) + scale_x_discrete() + 
  labs(title = paste("Sample plot"))  #breaks=seq(0, 4, by = 0.5)

print(final.plot)

When I use group = 1, I manage to get the plot but observations are colored differently. But where can I specify the 'groupid' information? Thanks in advance.

Upvotes: 1

Views: 742

Answers (2)

onlyphantom
onlyphantom

Reputation: 9583

The value you pass into aes() has to be a valid column name from the associated data frame.

This is the data we are going to work with:

set.seed(0)
dat <- data.frame(
  rownames=LETTERS[1:25],
  variables=sample(c("S1", "S2", "S3"), 25, replace = TRUE),
  value=runif(25)
)

groupid = sample(c("group1", "group2", "group3"), 25, replace = TRUE)
# assigning group as a new variable to the data we use for plotting
dat$group <- groupid

The data looks like this:

'data.frame':   25 obs. of  4 variables:
 $ rownames : Factor w/ 25 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ variables: Factor w/ 3 levels "S1","S2","S3": 3 1 2 2 3 1 3 3 2 2 ...
 $ value    : num  0.2672 0.3861 0.0134 0.3824 0.8697 ...
 $ group    : chr  "group3" "group2" "group3" "group2" ...

Notice how the group variable is present in the original data. The code for ggplot is relatively straightforward:

ggplot(dat, aes(x=rownames, y=value, color=group))+
  geom_point()

Produces this: enter image description here

The reason your code did not work was that groupid wasn't present in the data you pass into the ggplot call. You specified melteddf as the data parameter, but there was no groupid variable in that melteddf data frame.

If for some reason you needed the color aesthetics (col) to reference values from a different data frame than the one you specified in your ggplot2 call, you can do that as well.

The following code yield the same result:

set.seed(0)
dat <- data.frame(
  rownames=LETTERS[1:25],
  variables=sample(c("S1", "S2", "S3"), 25, replace = TRUE),
  value=runif(25)
)
# group is a different data frame from dat
group = data.frame("groupid"=sample(c("group1", "group2", "group3"), 25, replace = TRUE))

ggplot(data=dat, aes(x=rownames, y=value))+
  geom_point(aes(col=group$groupid))

Upvotes: 2

Djork
Djork

Reputation: 3369

In addition to @onlyphantom's answer, there are a few issues with your code.

You have unnecessary manipulation of your df to convert to long format. Notice that your original data frame df has the column group that is lost when you manipulated your data. More so, if you look at the structure of your melted data frame melteddf, your code created character values rather than numeric values:

str(melteddf)
'data.frame':   90 obs. of  3 variables:
$ rownames: chr  "V1" "V2" "V3" "V4" ...
$ variable: Factor w/ 9 levels "S1","S2","S3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value   : chr  " 0.5705084" " 0.62928774" " 2.2150650" " 0.96091621" ...

You need only one line of code to convert to long format, and to preserve your group id's, you can add the Group variable to your id.vars:

melteddf2 <- melt(df, id.vars=c("Samples", "Group"))

str(melteddf2)
'data.frame':   90 obs. of  4 variables:
$ Samples : Factor w/ 9 levels "S1","S2","S3",..: 1 2 3 4 5 6 7 8 9 1 ...
$ Group   : Factor w/ 3 levels "group1","group2",..: 1 1 1 2 2 2 3 3 3 1 ...
$ variable: Factor w/ 10 levels "V1","V2","V3",..: 1 1 1 1 1 1 1 1 1 2 ...
$ value   : num  0.571 0.611 -0.229 1.378 2.669 ...

head(melteddf2)
Samples  Group variable      value
1      S1 group1       V1  0.5705084
2      S2 group1       V1  0.6106827
3      S3 group1       V1 -0.2288912
4      S4 group2       V1  1.3781335
5      S5 group2       V1  2.6689560
6      S6 group2       V1  1.8686023

Finally with respect to your ggplot2 code, your y-axis values are continuous and you should not use scale_y_discrete, while your x-axis is already discrete and scale_x_discrete is not necessary. Use aes(colour=Group) if you want to use Group to define color groups.

ggplot(melteddf2, aes(x=variable, y=value, colour = Group)) + geom_point() +
  scale_y_continuous(breaks=seq(-3, 3, by = 0.5)) + 
  labs(title = paste("Sample plot"))

Upvotes: 1

Related Questions