User878239
User878239

Reputation: 709

Variable order in plot

I am using ggplot2 to draw some lines. I would like to change the labels. My data has two variables, x1 and x2.

The question is, how can I assign the labels in the correct order to x1 and x2, so that a certain label is assigned to x1 and another one is assigned to x2, and not the other way around. For instance, I would like to assign "AAAA" as label to x1 and "BBBB" as label to x2, and NOT "BBBB" to x1 and "AAAA" to x2. The following example shows what I mean:

set.seed(1)
test <- data.table(x = rnorm(29*2),var=c(rep("x1",29),rep("x2",29)),
                   time=rep(seq(as.Date("1983/12/31"),as.Date("2011/12/31"), "year"),2))


library(ggplot2);library(scales)
ggplot(data=test, aes(x=time, y=x, colour=var)) + 
  geom_line() +
  scale_color_manual(labels = c("AAAA","BBBB"),values=c("blue","red"))

I am pretty sure that in the above example "AAAA" is assigned to x1, because x1 comes first in the data. But I am not always sure which variable comes first. Is there any better way for a more direct assignment? Or how to keep control?

Thanks for any hints.

Upvotes: 1

Views: 210

Answers (2)

davidnortes
davidnortes

Reputation: 922

Just to offer you an alternative to Dave's answer. You can also use named vectors for both labels and colors, using the variables' names as names for objects in the vectors.

The advantage of this approach is that you do not need to modify your database (which is always risky, controversial and prone to errors) but you get full control over ggplot's representation in a simple and highly readable way.

With this approach, your code would look as follows (notice that I'm just tweaking your code a little bit):

library(ggplot2)
library(scales)
library(data.table)

set.seed(1)
test <- data.table(x = rnorm(29*2),var=c(rep("x1",29),rep("x2",29)),
                   time=rep(seq(as.Date("1983/12/31"),as.Date("2011/12/31"), "year"),2))

#Declaring named vector of labels 'plabels'
plabels <- c('x1' = "AAAA",
             'x2' = "BBBB")

#Declaring named vector of colors 'pcolors'
pcolors <- c('x1' = "green",
             'x2' = "blue")

#Plotting
ggplot(data=test, aes(x=time, y=x, colour=var)) + 
  geom_line() +
  scale_color_manual(labels = plabels, values=pcolors)

Resulting in:

enter image description here

Upvotes: 2

Dave
Dave

Reputation: 359

Without the scale_color_manual you'll have different colors automatically assigned to each one of the variables included.

I think that what you should do, is to change the values of the variable that you want to put "new labels".

This work for you?:

test$var <- as.factor(test$var) # It's a categorical variable.
levels(test$var) <- c("AAAA","BBBB") # We change x1 and x2 by AAAA and BBBB

ggplot(data=test, aes(x=time, y=x, colour=var)) + 
  geom_line()

From now on, all your plots that use var will have x1 as AAAA and x2 as BBBB.

On the other hand, if you want to force this changes without manually looking at the code (because you don't want to be unlucky because of the order of the values in the column), I suggest you to have a table where each row has the original value and the value that you would have, as a dictionary. (In my example I'm creating it in the code transf_vals, but it could be an external table)

Then, use this and not what was exposed before:

transf_vals = data.frame("original" = c("x1", "x2"), "new" = c("AAAA","BBBB")) #This could be a .csv or excel file 

test$var <- sapply(test$var, FUN = function(x){
  transf_vals$new[which(transf_vals$original == x)]
})

ggplot(data=test, aes(x=time, y=x, colour=var)) + 
  geom_line()

enter image description here

With sapply I do the next thing:

  1. For each value (row) of the column test$var
  2. Check where is located in my reference table transf_vals
  3. As that table has the original and the new one, just modify the original value by the new one, which is in another column of transf_vals

Upvotes: 1

Related Questions