Konrad
Konrad

Reputation: 18595

Colouring specific label in ggplot depending on the value of the id variable on a long data (irrespectively of the row number)

Let's say that I have a long data set and I would like to colour a specific label on the x-axis. In the case of the example below I would like to colour the label for Valiant.

# Packs
require(ggplot2)
require(reshape2)

# Data and trans
data(mtcars)
mtcars$model <- rownames(mtcars)
mtcars <- melt(mtcars, id.vars = "model")

# Some chart
ggplot(data = subset(x = mtcars, subset = mtcars$variable == "cyl"), 
       aes(x = model, y = value)) +
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 90, 
                                     colour =
                                         ifelse(mtcars$model == "Valiant",
                                                "red","black")))

The code produces the chart below that is erroneous as the wrong label is coloured.

Wrong label

The reason is fairly simple as what is created by ifelse does not match the order on the axis. I can fix the code by forcing ggplot to colour a specific row. The code below colours the right label as in the particular data.frame used for the chart the row with the Valiant value is 31.

# Fixed chart
ggplot(data = subset(x = mtcars, subset = mtcars$variable == "cyl"), 
       aes(x = model, y = value)) +
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 90, 
                                     colour =
                                         ifelse(as.numeric(rownames(mtcars)) == 31,
                                                "red","black")))

Fixed label

Clearly this solutions is extremely impractical. On the actual data I've a vast number of observations with multiple columns (geo, gender, indicator, value, etc.). That data is subsequently filtered via subset and different options are passed to the aes settings. Trying to figure out the row that should be coloured is a nightmare. I'm looking for a solution that would enable me to:

Upvotes: 4

Views: 2228

Answers (1)

mathematical.coffee
mathematical.coffee

Reputation: 56935

The reason the first one mismatches is that mtcars$model is much longer than the subset you are plotting, so the colour vector ifelse(mtcars$model == "Valiant","red","black") is of length 352 but the subset you are plotting is only of length 32. The same problem exists with your second example, though in this case the extra elements of colour (which are all "black" anyway) are dropped so you don't notice.

Unfortunately it looks like theme(...) doesn't get evaluated with the data column-names available to it (i.e. can't just do colour=ifelse(model == "Valiant", "red", "black") directly in the theme(...) call)

One alternative is to make model a factor and filter on levels(..) == "Valiant". If you have a long dataframe your id variable is most likely a factor anyway (or it would make sense for it to be one).

mtcars$model = factor(mtcars$model)
ggplot(data=subset(mtcars, variable == 'cyl'), aes(x=model, y=value)) +
    geom_bar(stat="identity") +
    theme(axis.text.x=element_text(angle=90,
              colour=ifelse(levels(mtcars$model) == 'Valiant', 'red', 'black')))

(your problem stems from feeding subset() into ggplot as your data, and then not being able to refer back to that particular subset in the theme call. I don't know if there is a tricksy way to do this).

Upvotes: 2

Related Questions