Reputation: 26570
Let's assume we have the following data frame
data <- data.frame(time=1:10, y1=runif(10), y2=runif(10), y3=runif(10))
and we want to create a plot like this:
p <- ggplot(data, aes(x=time))
p <- p + geom_line(aes(y=y1, colour="y1"))
p <- p + geom_line(aes(y=y2, colour="y2"))
p <- p + geom_line(aes(y=y3, colour="y3"))
plot(p)
But what if we have much more "y" columns, and we do not know their exact name. This raises the question: How can we iterate over all columns programmatically, and add them to the plot? Basically the goal is:
otherFeatures <- names(data)[-1]
for (f in otherFeatures) {
# what goes here?
}
So far I have found many ways that do not work. For instance (all following examples only show the code line in the above for loop):
My first try was simply to use aes_string
instead of aes
in order to specify the column name by the loop variable f
:
p <- p + geom_line(aes_string(y=f, colour=f))
But this does not give the same result, because now colour
will not be a fixed color for each line (aes_string
will interpret f
in the data frame environment). As a result, the legend will become a color bar, and does not contain the different column names. My next guess was to mix aes
and aes_string
, trying to set colour
to a fixed string:
p <- p + geom_line(aes_string(y=f), aes(colour=f))
But this results in Error: ggplot2 doesn't know how to deal with data of class uneval
. My next attempt was to use colour
"absolutely" (not within aes
) like this:
p <- p + geom_line(aes_string(y=f), colour=f)
But this gives Error: invalid color name 'y1'
(and I don't want to pick some proper color names manually either). The next try was to go back to aes
only, replicating the manual approach:
p <- p + geom_line(aes(y=data[[f]], colour=f))
This does not give an error, but will only plot the last column. This makes sense, since aes
will probably call substitute
, and the expression will always be evaluated with the last value of f
in the loop (rm f
before calling plot(p)
gives an error, indicating that the evaluation happens after the loop).
To rephrase the question: What kind of substitute
/eval
/quote
magic is necessary to replicate the simple code from above within a for loop?
Upvotes: 10
Views: 3898
Reputation: 66
This is old now but in case anyone else comes across it, I had a very similar problem that was driving me crazy. The solution I found was to pass aes_q()
to geom_line()
using the as.name()
option. You can find details on aes_q()
here. Below is the way I would solve this problem, though the same principle should work in a loop. Note that I add multiple variables with geom_line()
as a list here, which generalizes better (including to one variable).
varnames <- c("y1", "y2", "y3")
add_lines <- lapply(varnames, function(i) geom_line(aes_q(y = as.name(i), colour = i)))
p <- ggplot(data, aes(x = time))
p <- p + add_lines
plot(p)
Hope that helps!
Upvotes: 5
Reputation: 444
NOTE: This is not really an answer, just a very partial explanation of what is going on behind the scenes that might set on you on the right track. I have to admit my understanding of NSE is still very basic.
I have struggled and am still struggling with this particular issue. I have narrowed down the issue to NSE. I am not familiar with R's native substitute/quote/eval stuff, so I am going to demonstrate using the lazyeval
package.
library(lazyeval)
a <- lapply(c(1:9,13), function(i) lazy(i))
head(a)
# [[1]]
# <lazy>
# expr: c(1, 2, 3, 4, 5, 6, 7, 8, 9, 13)[[10L]]
# env: <environment: 0x25889a00>
#
# [[2]]
# <lazy>
# expr: c(1, 2, 3, 4, 5, 6, 7, 8, 9, 13)[[10L]]
# env: <environment: 0x25889a00>
#
# ...........
lazy_eval(a[[1]])
# [1] 13
lazy_eval(a[[2]])
# [1] 13
I think this happens because lazy(i)
binds to the promise of i
. By the time we get to evaluating any of these i
evaluations, i
is whatever was LAST assigned to it -- in this case, 13
. Perhaps this is due to the environment in which i
is evaluated being shared over all iterations of the lapply
function?
I have had to resort to the same workarounds as you through aes_string
and aes_q
. I found them quite unsatisfactory as they neither (1) fully consistent with aes
behavior and (2) particularly clean. Oh, the joys of learning NSE ;)
You can find the source code of the +
and aes
operators here:
ggplot2:::`+.gg`
ggplot2:::aes
ggplot2:::aes_q
ggplot2:::aes_string
Upvotes: 1
Reputation: 2400
You could melt
(thanks for reminding me of this function, rawr) all of your data into a few columns. For example, it could look like this:
library(reshape2)
data2 <- melt(data, id = "time")
head(data2)
# time variable value
# 1 1 y1 0.353088575
# 2 2 y1 0.621565368
# 3 3 y1 0.696031085
# 4 4 y1 0.507112969
# 5 5 y1 0.009560710
# 6 6 y1 0.158993988
ggplot(data2, aes(x = time, y = value, color = variable)) + geom_line()
Upvotes: 3