Karsten W.
Karsten W.

Reputation: 18500

Create a matrix of scatterplots (pairs() equivalent) in ggplot2

Is it possible to plot a matrix of scatter plots with ggplot2, using ggplot's nice features like mapping additional factors to color, shape etc. and adding smoother?

I am thinking about something similar to the base function pairs.

Upvotes: 146

Views: 105552

Answers (6)

rosie-betzler
rosie-betzler

Reputation: 121

If you only want to use ggplot2 for plotting, here is a solution similar to the one proposed by @mjktfw but with a shorter, perhaps cleaner code:

library(tidyr)
library(dplyr)
library(ggplot2)
data(iris)

# Create id so that observations can be re-identified
iris <- iris |> 
  mutate(id = row_number()) 

# Prepare data to be plotted on the x axis
x_vars <- pivot_longer(data = iris,
             cols = Sepal.Length:Petal.Width,
             names_to = "variable_x",
             values_to = "x")

# Prepare data to be plotted on the y axis  
y_vars <- pivot_longer(data = iris,
                       cols = Sepal.Length:Petal.Width,
                       names_to = "variable_y",
                       values_to = "y") 

# Join data for x and y axes and make plot
full_join(x_vars, y_vars, 
          by = c("id", "Species"),
          relationship = "many-to-many") |>
  ggplot() + 
  aes(x = x, y = y, color = Species) +
  geom_point() +
  facet_grid(c("variable_x", "variable_y")) 

enter image description here

Upvotes: 1

gbt
gbt

Reputation: 809

a bit later, i attach an alternative that is not using dplyr:

library("ggplot2")
library("reshape")

# what vars to plot
vars_to_plot <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")

# melt the table
melted <- melt(iris[, c("Species", vars_to_plot)])

# define empty vector
final_all <- vector()

# for each variable of interest
for (a_var in vars_to_plot) {

    # get its actual values
    temp <- iris[, a_var]
    
    # replicate them for each variable
    temp_col <- rep(temp, length(unique(melted$variable)))
    
    # rbind them
    final_all <- rbind.data.frame(final_all, cbind(melted, var=rep(a_var, length(temp_col)), temp_col))
    
    # remove the variable that was just added to the final table
    melted <- melted[-which(melted$variable==a_var), ]
}

# remove duplicate comparisons, if needed
final_no_dup <- final_all[-which(final_all$variable==final_all$var), ]

# plot
ggplot_pairs <- ggplot(final_no_dup, aes(x=value, y=temp_col, fill=Species)) +
    geom_point(shape=21, size=5, color="black", stroke=3) +
    facet_wrap(variable~var, scales='free', labeller=label_wrap_gen(multi_line=FALSE)) +
    xlab("") +
    ylab("") +
    guides(fill=guide_legend(override.aes=list(shape=21))) +
    theme_bw()

plot(ggplot_pairs)

Upvotes: 1

epo3
epo3

Reputation: 3121

Try scatterPlotMatrix. It's very flexible and produces nice looking interactive charts.

library(scatterPlotMatrix)
scatterPlotMatrix(iris, zAxisDim = "Species")

enter image description here

Upvotes: 5

mjktfw
mjktfw

Reputation: 870

If one wants to obtain a ggplot object (not ggmatrix as in case of ggpairs()), the solution is to melt the data twice, then ggplot with facetting. facet_wrap would be better than facet_grid in limiting the plotted area, given the scales = 'free' parameter is supplied.

require(ggplot2) 
require(dplyr)
require(tidyr)

gatherpairs <- function(data, ..., 
                        xkey = '.xkey', xvalue = '.xvalue',
                        ykey = '.ykey', yvalue = '.yvalue',
                        na.rm = FALSE, convert = FALSE, factor_key = FALSE) {
  vars <- quos(...)
  xkey <- enquo(xkey)
  xvalue <- enquo(xvalue)
  ykey <- enquo(ykey)
  yvalue <- enquo(yvalue)

  data %>% {
    cbind(gather(., key = !!xkey, value = !!xvalue, !!!vars,
                 na.rm = na.rm, convert = convert, factor_key = factor_key),
          select(., !!!vars)) 
  } %>% gather(., key = !!ykey, value = !!yvalue, !!!vars,
               na.rm = na.rm, convert = convert, factor_key = factor_key)
}

iris %>% 
  gatherpairs(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% {
  ggplot(., aes(x = .xvalue, y = .yvalue, color = Species)) +
      geom_point() + 
      geom_smooth(method = 'lm') +
      facet_wrap(.xkey ~ .ykey, ncol = length(unique(.$.ykey)), scales = 'free', labeller = label_both) +
      scale_color_brewer(type = 'qual')
}

enter image description here

Upvotes: 24

naught101
naught101

Reputation: 19563

I keep wanting to do this, but plotmatrix is crap. Hadley recommends using the GGally package instead. It has a function, ggpairs that is a vastly improved pairs plot (lets you use non-continuous variables in your data frames). It plots different plots in each square, depending on the variable types:

library(GGally)
ggpairs(iris, aes(colour = Species, alpha = 0.4))

enter image description here

Upvotes: 261

Matt Bannert
Matt Bannert

Reputation: 28274

You might want to try plotmatrix:

  library(ggplot2)
  data(mtcars)
  plotmatrix(mtcars[,1:3])

to me mpg (first column in mtcars) should not be a factor. I haven't checked it, but there's no reason why it should be one. However I get a scatter plot :)


Note: For future reference, the plotmatrix() function has been replaced by the ggpairs() function from the GGally package as @naught101 suggests in another response below to this question.

Upvotes: 40

Related Questions