Gnosos
Gnosos

Reputation: 47

"Error: Aesthetics must be either length 1 or the same as the data" Why?

Wickham (2009: 164-6) gives an example of plotting multiple time series simultaneously. The following code replicates his example:

  range01 <- function(x) {
      rng <- range(x, na.rm = TRUE)
      (x - rng[1])/diff(rng)
  }

  emp <- subset(economics_long, variable %in% c("uempmed", "unemploy"))
  emp2 <- ddply(emp, .(variable), transform, value = range01(value))
  qplot(date, value, data = emp2, geom = "line", color = variable, linetype =     variable)
  # Produces a plot that looks like the one on p. 166 of the ggplot2 book.

Here range01 is used to recode variables to values within [0,1] so series with different orders of magnitude can be plotted on identical scales. Wickham's original also starts with the employment data provided with ggplot2 and melts it into long form, but here I've taken the shortcut of starting with the employment_long version.

But Wickham (p. 27) also points out that tapping the "full power" of ggplot2 requires manual construction of plots by layers, using the ggplot() function. Here is his example again but using ggplot() instead of qplot():

# Now do the same thing using ggplot commands only
ggplot(data = emp2, aes(x = date)) +
  geom_line(aes(y = value, group = variable, color = variable, linetype = variable))
# Get the same results

Both examples take advantage of ggplot2's default settings. But suppose we want greater control over the aesthetics. Perhaps some variables lend themselves to particular color schemes (e.g., green might be used for environmentally friendly variables and black, for detrimental ones); or perhaps in a long monograph with many plots we just want to ensure consistency. Furthermore, if the plots will be used both in presentations and printed black-and-white text, we may also want to associate specific line types with particular series; this could also be the case if we are concerned about viewers with color blindness. Finally, variable names usually are poor descriptors of what variables really are, so we want to associate variable labels with the individual time series.

So we define the following for the economics dataset:

# Try to control the look a bit more
economics_colors = c("pce" = "red", "pop" = "orange", "psavert" = "yellow",
    "uempmed" = "green", "unemploy" = "blue")
economics_linetypes = c("pce" = "solid", "pop" = "dashed", "psavert" = "dotted",
    "uempmed" = "dotdash", "unemploy" = "longdash")
economics_labels = c(
    "pce" = "Personal consumption expenditures",
    "pop" = "Total population",
    "psavert" = "Personal savings rate",
    "uempmed" = "Median duration of unemployment",
    "unemploy" = "Number of unemployed"
)

Now construct the plot by adding separate layers (Wickham 2009: 164-5) for each variable:

# First do it line-by-line
employment.plot <- ggplot(emp2) + aes(x = date)  +
  scale_linetype_manual(values = economics_linetypes, labels = economics_labels)
employment.plot <- employment.plot +
  geom_line(data = subset(emp2, variable == "uempmed"),
      aes(y = value, linetype = "uempmed"), color = economics_colors["uempmed"])
employment.plot <- employment.plot +
  geom_line(data = subset(emp2, variable == "unemploy"),
            aes(x = date, y = value, linetype = "unemploy"), color = economics_colors["unemploy"])
employment.plot
# Except for the specific line colors, produces the same plot as before.

Notice two things here. First, line types are mapped but colors are set (see Wickham 2009: 47-49). This produces the desired result of a single legend with distinct color-linetype combinations for each series.

Second, even though the data are organized in "long" format, we used subset to select out individual series. This is not the best solution. As Wickham (164-5) says:

... a better alternative is to melt the data into a long format and then visualize that. In the molten data the time series have their value stored in the value variable and we can distinguish between them with the variable variable.

So let's try this approach:

# Now try it the automatic way
employment.plot <- ggplot(data = emp2, aes(x = date))  +
  scale_linetype_manual(values = economics_linetypes, labels = economics_labels)
employment.plot <- employment.plot +
  geom_line(aes(y = value, group = variable, linetype = economics_linetypes), color = economics_colors)
employment.plot
# Throws "Error: Aesthetics must be either length 1 or the same as the data (1148) ..."

As the comment indicates, this code throws an error regarding the aesthetics. Why?

Also, is there another way to accomplish the multiple goals of using melted data with the single variable variable triggering separate lines, controlling which colors and line types are associated with each series, and using code to standardize such conventions across multiple plots?

References

Wickham, Hadley. 2009. ggplot2: Elegant Graphics for Data Analysis. Springer.

Upvotes: 0

Views: 2382

Answers (1)

GGamba
GGamba

Reputation: 13680

The aesthetics should always be mapped to a dimension of the dataset.

What you are saying with the last command is: "For each 'data point' (or group in this case) assign a linetype equal to its economics_linetypes."

But there is not (yet) information on how to map each record (group) to any value in economics_linetypes. So it rightly return an error.

What you should do is map the linetype to the dimension that controls it. That is: "for each value in this dimension, use a different value of linetype" i.e.:

geom_line(aes(y = value, group = variable, linetype = variable)

Once we have that defined we can map the value of variable to a specific linetype with the definition of a scale:

scale_linetype_manual(values = economics_linetypes, labels = economics_labels)

All of this appplies to color as well of course, so at the end we have:

employment.plot <- ggplot(data = emp2, aes(x = date))  +
    geom_line(aes(y = value, group = variable, linetype = variable, color = variable)) +
    scale_linetype_manual(values = economics_linetypes, labels = economics_labels) +
    scale_color_manual(values = economics_colors, labels = economics_labels)

enter image description here

Hope this is clear enough.

Upvotes: 2

Related Questions