Reputation:
I'd like to correlate the same column of a dataframe for points with distinct row values. For example, in the iris
dataframe, I'd like to make three scatter plots comparing Petal.Length
of virginica
with that of versicolor
, setosa
with virginica
and versicolor
with setosa
. I want it to appear just like a normal facet_grid
or facet_wrap
plot. For example, I can do:
ggplot(iris) + geom_point(aes(x=Petal.Length, y=Petal.Length)) + facet_grid(~Species)
This is not what I want, since it's plotting Petal.Length
of each species against itself, but I want the plot to appear like this, except where I handcode which species to compare to what other species. How can this be done in ggplot
? Thanks.
Upvotes: 3
Views: 9530
Reputation: 14667
Your question seems to be about comparing a single variable measured on many individuals that fall into multiple categories. Given your example using the iris
dataset, a scatterplot is probably not a useful visualization.
Here I offer several univariate visualizations available in ggplot2
. I hope one of these is helpful:
library(ggplot2)
plot_1 = ggplot(iris, aes(x=Petal.Length, colour=Species)) +
geom_density() +
labs(title="Density plots")
plot_2 = ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_histogram(colour="grey30", binwidth=0.15) +
facet_grid(Species ~ .) +
labs(title="Histograms")
plot_3 = ggplot(iris, aes(y=Petal.Length, x=Species)) +
geom_point(aes(colour=Species),
position=position_jitter(width=0.05, height=0.05)) +
geom_boxplot(fill=NA, outlier.colour=NA) +
labs(title="Boxplots")
plot_4 = ggplot(iris, aes(y=Petal.Length, x=Species, fill=Species)) +
geom_dotplot(binaxis="y", stackdir="center", binwidth=0.15) +
labs(title="Dot plots")
library(gridExtra)
part_1 = arrangeGrob(plot_1, plot_2, heights=c(0.4, 0.6))
part_2 = arrangeGrob(plot_3, plot_4, nrow=2)
parts_12 = arrangeGrob(part_1, part_2, ncol=2, widths=c(0.6, 0.4))
ggsave(file="plots.png", parts_12, height=6, width=10, units="in")
Upvotes: 12
Reputation: 118799
It is better to group the data first. I'd do something like this:
# get Petal.Length for each species separately
df1 <- subset(iris, Species == "virginica", select=c(Petal.Length, Species))
df2 <- subset(iris, Species == "versicolor", select=c(Petal.Length, Species))
df3 <- subset(iris, Species == "setosa", select=c(Petal.Length, Species))
# construct species 1 vs 2, 2 vs 3 and 3 vs 1 data
df <- data.frame(x=c(df1$Petal.Length, df2$Petal.Length, df3$Petal.Length),
y = c(df2$Petal.Length, df3$Petal.Length, df1$Petal.Length),
grp = rep(c("virginica.versicolor", "versicolor.setosa", "setosa.virginica"), each=50))
df$grp <- factor(df$grp)
# plot
require(ggplot2)
ggplot(data = df, aes(x = x, y = y)) + geom_point(aes(colour=grp)) + facet_wrap( ~ grp)
This results in:
Upvotes: 5