Jeroen Ooms
Jeroen Ooms

Reputation: 32978

How to create a Marimekko/Mosaic plot in ggplot2

The Marimekko/Mosaic plot is a nice default plot when both x and y are categorical variables. What is the best way to create these using ggplot?

example

The only reference I could find was this 4yo blog post but this seems a bit outdated. Are there any better or easier implementations avaialable by now? The GGally package has a function ggally_ratio but this produces something quite different:

ggally

Upvotes: 46

Views: 30611

Answers (7)

Z.Lin
Z.Lin

Reputation: 29085

I had the same issue for a project some time back. My solution was to use geom_bar together with the scales="free_x", space="free_x" option in facet_grid to accommodate different bar widths:

# using diamonds dataset for illustration
df <- diamonds |>
  group_by(cut, clarity) |>
  summarise(count = n()) |>
  mutate(cut.count = sum(count),
         prop = count/sum(count)) |>
  ungroup()
    
ggplot(df,
       aes(x = cut, y = prop, width = cut.count, fill = clarity)) +
  geom_bar(stat = "identity", position = "fill", colour = "black") +
  # geom_text(aes(label = scales::percent(prop)), position = position_stack(vjust = 0.5)) + # if labels are desired
  facet_grid(~cut, scales = "free_x", space = "free_x") +
  scale_fill_brewer(palette = "RdYlGn") +
  # theme(panel.spacing.x = unit(0, "npc")) + # if no spacing preferred between bars
  theme_void() 

marimekko plot

Upvotes: 27

cpsyctc
cpsyctc

Reputation: 101

Thanks all who created this entry which really helped me as ggmosaic wasn't doing what I wanted (and not labelling axes properly). The nice function from Z.Lin throws a warning sort of explained in https://github.com/tidyverse/ggplot2/issues/3142 which seems to say that warning, which is technically untrue in its content, is really warning us that the ggplotocracy, bless and thank them, feel that geom_bar shouldn't really have variable widths. I guess I see the point so I went for the function from Jake Fisher and tweaked it to my own needs. In case it's useful to others, here it is:

makeplot_mosaic2 <- function(data, x, y, statDigits = 1, residDigits = 1, pDigits = 3, ...){
  ### from https://stackoverflow.com/questions/19233365/how-to-create-a-marimekko-mosaic-plot-in-ggplot2,
  ### this from Jake Fisher (I think)
  xvar <- deparse(substitute(x))
  yvar <- deparse(substitute(y))
  mydata <- data[c(xvar, yvar)]
  mytable <- table(mydata)
  
  widths <- c(0, cumsum(apply(mytable, 1, sum)))
  heights <- apply(mytable, 1, function(x){c(0, cumsum(x/sum(x)))})
  
  alldata <- data.frame()
  allnames <- data.frame()
  for(i in 1:nrow(mytable)){
    for(j in 1:ncol(mytable)){
      alldata <- rbind(alldata, c(widths[i], widths[i+1], heights[j, i], heights[j+1, i]))
    }
  }
  colnames(alldata) <- c("xmin", "xmax", "ymin", "ymax")
  
  alldata[[xvar]] <- rep(dimnames(mytable)[[1]],rep(ncol(mytable), nrow(mytable)))
  alldata[[yvar]] <- rep(dimnames(mytable)[[2]],nrow(mytable))
  
  chisq <- chisq.test(mytable)
  df <- chisq$parameter
  pval <- chisq$p.value
  chisqval <- chisq$statistic
  # stdResids <- chisq$stdres
  alldata$xcent <- (alldata$xmin + alldata$xmax)/2
  alldata$ycent <- (alldata$ymin + alldata$ymax)/2
  alldata$stdres <- round(as.vector(t(chisq$stdres)), residDigits)
  # print(chisq$stdres)
  # print(alldata)
  
  titleTxt1 <- paste0("Mosaic plot of ",
                      yvar,
                      " against ",
                      xvar,
                      ", ")
  titleTxt2 <- paste0("chisq(",
                     df,
                     ") = ",
                     round(chisqval, statDigits),
                     ", p = ",
                     format.pval(pval, digits = pDigits))
  titleTxt <- paste0(titleTxt1, titleTxt2)
  subTitleTxt <- "Cell labels are standardised residuals"
  
  ggplot(data  = alldata, 
         aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)) + 
    geom_rect(color="black", aes_string(fill=yvar)) +
    geom_text(aes(x = xcent, y = ycent, label = stdres)) +
    xlab(paste0("Count of '", 
                xvar,
                "', total = ",
                max(alldata$xmax))) + # tweaked by CE
    ylab(paste0("Proportion of '", 
                yvar,
                "' per level of '",
                xvar,
                "'")) +
    ggtitle(titleTxt,
            subtitle = subTitleTxt) +
    theme_bw() +
    theme(plot.title = element_text(hjust = .5),
          plot.subtitle = element_text(hjust = .5))
}

makeplot_mosaic2(mtcars, vs, gear)
makeplot_mosaic2(diamonds, cut, clarity)

Upvotes: 2

Jake Fisher
Jake Fisher

Reputation: 3310

Following user2030503's suggestion, here's a version that uses ggmosaic. (Note that ggplot 3.0 broke some piece of ggmosaic, so you need the most recent version.)

library(tidyverse)
library(ggmosaic)

# Data copied from linked blog post
df <- data.frame(
  segment = LETTERS[1:4],
  segpct = c(40, 30, 20, 10),
  Alpha = c(60, 40, 30, 25), 
  Beta = c(25, 30, 30, 25),
  Gamma = c(10, 20, 20, 25), 
  Delta = c(5, 10, 20, 25)
  )

# Convert to "long" for plotting
df_long <- gather(df, key = "greek_letter", value = "pct", 
                  -c("segment", "segpct")) %>% 
  mutate(
    greek_letter = factor(
      greek_letter, 
      levels = c("Alpha", "Beta", "Gamma", "Delta")
      ),
    weight = (segpct * pct) / 10000
    )

# Plot
ggplot(df_long) +
  geom_mosaic(aes(x = product(greek_letter, segment), fill = greek_letter,
                  weight = weight))

enter image description here

Upvotes: 4

user2030503
user2030503

Reputation: 3104

You may use the ggplot2 extension package called "ggmosaic" (https://github.com/haleyjeppson/ggmosaic).

Extensive tutorial with example code and visual results is given here https://cran.r-project.org/web/packages/ggmosaic/vignettes/ggmosaic.html.

Upvotes: 9

Edwin
Edwin

Reputation: 3242

I did it myself a time ago, using just geom_bar, I turned it into a general function so it should work on any two factors. enter image description here

ggMMplot <- function(var1, var2){
  require(ggplot2)
  levVar1 <- length(levels(var1))
  levVar2 <- length(levels(var2))

  jointTable <- prop.table(table(var1, var2))
  plotData <- as.data.frame(jointTable)
  plotData$marginVar1 <- prop.table(table(var1))
  plotData$var2Height <- plotData$Freq / plotData$marginVar1
  plotData$var1Center <- c(0, cumsum(plotData$marginVar1)[1:levVar1 -1]) +
    plotData$marginVar1 / 2

  ggplot(plotData, aes(var1Center, var2Height)) +
    geom_bar(stat = "identity", aes(width = marginVar1, fill = var2), col = "Black") +
    geom_text(aes(label = as.character(var1), x = var1Center, y = 1.05)) 
  }

ggMMplot(diamonds$cut, diamonds$clarity)

Upvotes: 29

stefan.schroedl
stefan.schroedl

Reputation: 866

Plotluck is a library based on ggplot2 that aims at automating the choice of plot type based on characteristics of 1-3 variables. It contains a function for mosaic plots. Example: plotluck(mtcars,vs,gear)

enter image description here

Upvotes: 6

Jeroen Ooms
Jeroen Ooms

Reputation: 32978

A first attempt. I'm not sure how to put the factor labels on the axis though.

makeplot_mosaic <- function(data, x, y, ...){
  xvar <- deparse(substitute(x))
  yvar <- deparse(substitute(y))
  mydata <- data[c(xvar, yvar)];
  mytable <- table(mydata);
  widths <- c(0, cumsum(apply(mytable, 1, sum)));
  heights <- apply(mytable, 1, function(x){c(0, cumsum(x/sum(x)))});

  alldata <- data.frame();
  allnames <- data.frame();
  for(i in 1:nrow(mytable)){
    for(j in 1:ncol(mytable)){
      alldata <- rbind(alldata, c(widths[i], widths[i+1], heights[j, i], heights[j+1, i]));
    }
  }
  colnames(alldata) <- c("xmin", "xmax", "ymin", "ymax")

  alldata[[xvar]] <- rep(dimnames(mytable)[[1]],rep(ncol(mytable), nrow(mytable)));
  alldata[[yvar]] <- rep(dimnames(mytable)[[2]],nrow(mytable));

  ggplot(alldata, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)) + 
    geom_rect(color="black", aes_string(fill=yvar)) +
    xlab(paste(xvar, "(count)")) + ylab(paste(yvar, "(proportion)"));
}

Example:

makeplot_mosaic(mtcars, vs, gear)

example

Upvotes: 13

Related Questions