Reputation: 18500
Is it possible to plot a matrix of scatter plots with ggplot2
, using ggplot
's nice features like mapping additional factors to color, shape etc. and adding smoother?
I am thinking about something similar to the base
function pairs
.
Upvotes: 146
Views: 105552
Reputation: 121
If you only want to use ggplot2
for plotting, here is a solution similar to the one proposed by @mjktfw but with a shorter, perhaps cleaner code:
library(tidyr)
library(dplyr)
library(ggplot2)
data(iris)
# Create id so that observations can be re-identified
iris <- iris |>
mutate(id = row_number())
# Prepare data to be plotted on the x axis
x_vars <- pivot_longer(data = iris,
cols = Sepal.Length:Petal.Width,
names_to = "variable_x",
values_to = "x")
# Prepare data to be plotted on the y axis
y_vars <- pivot_longer(data = iris,
cols = Sepal.Length:Petal.Width,
names_to = "variable_y",
values_to = "y")
# Join data for x and y axes and make plot
full_join(x_vars, y_vars,
by = c("id", "Species"),
relationship = "many-to-many") |>
ggplot() +
aes(x = x, y = y, color = Species) +
geom_point() +
facet_grid(c("variable_x", "variable_y"))
Upvotes: 1
Reputation: 809
a bit later, i attach an alternative that is not using dplyr:
library("ggplot2")
library("reshape")
# what vars to plot
vars_to_plot <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
# melt the table
melted <- melt(iris[, c("Species", vars_to_plot)])
# define empty vector
final_all <- vector()
# for each variable of interest
for (a_var in vars_to_plot) {
# get its actual values
temp <- iris[, a_var]
# replicate them for each variable
temp_col <- rep(temp, length(unique(melted$variable)))
# rbind them
final_all <- rbind.data.frame(final_all, cbind(melted, var=rep(a_var, length(temp_col)), temp_col))
# remove the variable that was just added to the final table
melted <- melted[-which(melted$variable==a_var), ]
}
# remove duplicate comparisons, if needed
final_no_dup <- final_all[-which(final_all$variable==final_all$var), ]
# plot
ggplot_pairs <- ggplot(final_no_dup, aes(x=value, y=temp_col, fill=Species)) +
geom_point(shape=21, size=5, color="black", stroke=3) +
facet_wrap(variable~var, scales='free', labeller=label_wrap_gen(multi_line=FALSE)) +
xlab("") +
ylab("") +
guides(fill=guide_legend(override.aes=list(shape=21))) +
theme_bw()
plot(ggplot_pairs)
Upvotes: 1
Reputation: 3121
Try scatterPlotMatrix. It's very flexible and produces nice looking interactive charts.
library(scatterPlotMatrix)
scatterPlotMatrix(iris, zAxisDim = "Species")
Upvotes: 5
Reputation: 870
If one wants to obtain a ggplot
object (not ggmatrix
as in case of ggpairs()
), the solution is to melt the data twice, then ggplot
with facetting. facet_wrap
would be better than facet_grid
in limiting the plotted area, given the scales = 'free'
parameter is supplied.
require(ggplot2)
require(dplyr)
require(tidyr)
gatherpairs <- function(data, ...,
xkey = '.xkey', xvalue = '.xvalue',
ykey = '.ykey', yvalue = '.yvalue',
na.rm = FALSE, convert = FALSE, factor_key = FALSE) {
vars <- quos(...)
xkey <- enquo(xkey)
xvalue <- enquo(xvalue)
ykey <- enquo(ykey)
yvalue <- enquo(yvalue)
data %>% {
cbind(gather(., key = !!xkey, value = !!xvalue, !!!vars,
na.rm = na.rm, convert = convert, factor_key = factor_key),
select(., !!!vars))
} %>% gather(., key = !!ykey, value = !!yvalue, !!!vars,
na.rm = na.rm, convert = convert, factor_key = factor_key)
}
iris %>%
gatherpairs(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% {
ggplot(., aes(x = .xvalue, y = .yvalue, color = Species)) +
geom_point() +
geom_smooth(method = 'lm') +
facet_wrap(.xkey ~ .ykey, ncol = length(unique(.$.ykey)), scales = 'free', labeller = label_both) +
scale_color_brewer(type = 'qual')
}
Upvotes: 24
Reputation: 19563
I keep wanting to do this, but plotmatrix is crap. Hadley recommends using the GGally package instead. It has a function, ggpairs that is a vastly improved pairs plot (lets you use non-continuous variables in your data frames). It plots different plots in each square, depending on the variable types:
library(GGally)
ggpairs(iris, aes(colour = Species, alpha = 0.4))
Upvotes: 261
Reputation: 28274
You might want to try plotmatrix:
library(ggplot2)
data(mtcars)
plotmatrix(mtcars[,1:3])
to me mpg (first column in mtcars) should not be a factor. I haven't checked it, but there's no reason why it should be one. However I get a scatter plot :)
Note: For future reference, the plotmatrix()
function has been replaced by the ggpairs()
function from the GGally
package as @naught101 suggests in another response below to this question.
Upvotes: 40