Reputation: 415
I have a sample data given below with sample IDs which are unique, and 3 groups. I need to plot all the observations (rows) in 'df' but color them according to the group IDs ('groupid'). Here's what I have so far:
# sample data creation
samples <- paste0("S",c(1:9))
groupid <- c("group1", "group2", "group3")
foo <- data.frame(Samples = samples, Group = rep(groupid, each = 3))
bar <- data.frame()
for(i in 1:length(samples)){
ran.data <- rnorm(10, 0.5)
#colnames <- paste0("w",c(1:length(ran.data)))
for(j in 1:length(ran.data)){
bar[i,j] <- ran.data[j]
}
}
df <- cbind(foo, bar)
# ******************
# creating plot data
plotdf <- as.data.frame(t(df))
cols <- as.character(unlist(plotdf[1,]))
plotdf <- plotdf[-c(1,2),] # removing rows
groupid <- df$Group # var to group by
names(plotdf) <- cols
plotdfrows <- names(df[,3:ncol(df)])
plotdf$rownames <- plotdfrows
# melt and plot
library(reshape2)
library(ggplot2)
melteddf <- melt(plotdf, id.var = "rownames")
final.plot <- ggplot(melteddf, aes(rownames, value, colour = variable, group = groupid)) + geom_point() + #geom_line() +
scale_y_discrete(breaks=seq(-3, 3, by = 0.5)) + scale_x_discrete() +
labs(title = paste("Sample plot")) #breaks=seq(0, 4, by = 0.5)
print(final.plot)
When I use group = 1, I manage to get the plot but observations are colored differently. But where can I specify the 'groupid' information? Thanks in advance.
Upvotes: 1
Views: 742
Reputation: 9583
The value you pass into aes()
has to be a valid column name from the associated data frame.
This is the data we are going to work with:
set.seed(0)
dat <- data.frame(
rownames=LETTERS[1:25],
variables=sample(c("S1", "S2", "S3"), 25, replace = TRUE),
value=runif(25)
)
groupid = sample(c("group1", "group2", "group3"), 25, replace = TRUE)
# assigning group as a new variable to the data we use for plotting
dat$group <- groupid
The data looks like this:
'data.frame': 25 obs. of 4 variables:
$ rownames : Factor w/ 25 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ variables: Factor w/ 3 levels "S1","S2","S3": 3 1 2 2 3 1 3 3 2 2 ...
$ value : num 0.2672 0.3861 0.0134 0.3824 0.8697 ...
$ group : chr "group3" "group2" "group3" "group2" ...
Notice how the group
variable is present in the original data. The code for ggplot
is relatively straightforward:
ggplot(dat, aes(x=rownames, y=value, color=group))+
geom_point()
The reason your code did not work was that groupid
wasn't present in the data you pass into the ggplot
call. You specified melteddf
as the data parameter, but there was no groupid
variable in that melteddf
data frame.
If for some reason you needed the color aesthetics (col
) to reference values from a different data frame than the one you specified in your ggplot2
call, you can do that as well.
The following code yield the same result:
set.seed(0)
dat <- data.frame(
rownames=LETTERS[1:25],
variables=sample(c("S1", "S2", "S3"), 25, replace = TRUE),
value=runif(25)
)
# group is a different data frame from dat
group = data.frame("groupid"=sample(c("group1", "group2", "group3"), 25, replace = TRUE))
ggplot(data=dat, aes(x=rownames, y=value))+
geom_point(aes(col=group$groupid))
Upvotes: 2
Reputation: 3369
In addition to @onlyphantom's answer, there are a few issues with your code.
You have unnecessary manipulation of your df
to convert to long format. Notice that your original data frame df
has the column group
that is lost when you manipulated your data. More so, if you look at the structure of your melted data frame melteddf
, your code created character values rather than numeric values:
str(melteddf)
'data.frame': 90 obs. of 3 variables:
$ rownames: chr "V1" "V2" "V3" "V4" ...
$ variable: Factor w/ 9 levels "S1","S2","S3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : chr " 0.5705084" " 0.62928774" " 2.2150650" " 0.96091621" ...
You need only one line of code to convert to long format, and to preserve your group id's, you can add the Group
variable to your id.vars
:
melteddf2 <- melt(df, id.vars=c("Samples", "Group"))
str(melteddf2)
'data.frame': 90 obs. of 4 variables:
$ Samples : Factor w/ 9 levels "S1","S2","S3",..: 1 2 3 4 5 6 7 8 9 1 ...
$ Group : Factor w/ 3 levels "group1","group2",..: 1 1 1 2 2 2 3 3 3 1 ...
$ variable: Factor w/ 10 levels "V1","V2","V3",..: 1 1 1 1 1 1 1 1 1 2 ...
$ value : num 0.571 0.611 -0.229 1.378 2.669 ...
head(melteddf2)
Samples Group variable value
1 S1 group1 V1 0.5705084
2 S2 group1 V1 0.6106827
3 S3 group1 V1 -0.2288912
4 S4 group2 V1 1.3781335
5 S5 group2 V1 2.6689560
6 S6 group2 V1 1.8686023
Finally with respect to your ggplot2
code, your y-axis values are continuous and you should not use scale_y_discrete
, while your x-axis is already discrete and scale_x_discrete
is not necessary. Use aes(colour=Group)
if you want to use Group
to define color groups.
ggplot(melteddf2, aes(x=variable, y=value, colour = Group)) + geom_point() +
scale_y_continuous(breaks=seq(-3, 3, by = 0.5)) +
labs(title = paste("Sample plot"))
Upvotes: 1