d20chick
d20chick

Reputation: 35

How to color code a categorical variable in a mosaic

I am trying to display a relationship between my categorical variables. I finally got my data into what I believe is a contingency table

subs_count
##                [,1] [,2] [,3] [,4]
## carbohydrate      2    0   11    2
## cellulose        18    0   60    0
## chitin            0    4    0    4
## hemicellulose    21    3   10    0
## monosaccharide    3    0    0    0
## pectin            8    0    2    2
## starch            1    0    4    0

Where each column represents an organism. So for my plot I put in

barplot(subs_count, ylim = c(0, 100), col = predicted.substrate,
  xlab = "organism", ylab = "ESTs per substrate")

But my substrates are not consistently the same color. What am I doing wrong?

Upvotes: 1

Views: 3014

Answers (1)

Achim Zeileis
Achim Zeileis

Reputation: 17193

Your data seems to be a matrix with row names which is close to a contingency table in R but not exactly the same. Some plotting methods have additional support for tables.

More importantly, I couldn't run your code because it is unclear what predicted.substrate is. If it were a palette with 7 colors then it should do what you intend to do (or at least what I think you intend).

I replicated your data with:

subs_count <- structure(c(2, 18, 0, 21, 3, 8, 1, 0, 0,
  4, 3, 0, 0, 0, 11, 60, 0, 10, 0, 2, 4, 2, 0, 4, 0, 0, 2, 0),
  .Dim = c(7L, 4L), .Dimnames = list(c("carbohydrate", "cellulose",
  "chitin", "hemicellulose", "monosaccharide", "pectin", "starch"), NULL))

And then transformed them into a table by:

subs_count <- as.table(subs_count)
names(dimnames(subs_count)) <- c("EST", "Organism")

Then I used a qualitative palette from the colorspace package:

subs_pal <- colorspace::qualitative_hcl(7)

And with your barplot seems to be reasonable:

barplot(subs_count, ylim = c(0,100), col = subs_pal,
  xlab = "organism", ylab = "ESTs per substrate", legend = TRUE)

barplot

And a mosaic display (as indicated in your title) would be:

mosaicplot(t(subs_count), col = subs_pal, off = 5, las = 1, main = "")

mosaicplot

For visualizing patterns of dependence (or rather departures from independence) a mosaic plot shaded with residuals from the independence model might be even more useful.

mosaicplot(t(subs_count), shade = TRUE, off = 5, las = 1, main = "")

mosaicplot-shaded

More refined versions of shaded mosaic displays are available in package vcd (see doi:10.18637/jss.v017.i03).

Upvotes: 3

Related Questions