Camford Oxbridge
Camford Oxbridge

Reputation: 894

Contour plot via Scatter plot

Scatter plots are useless when number of plots is large.

So, e.g., using normal approximation, we can get the contour plot.

My question: Is there any package to implement the contour plot from scatter plot. enter image description here


Thank you @G5W !! I can do it !!

enter image description here

Upvotes: 1

Views: 2635

Answers (2)

G5W
G5W

Reputation: 37641

You don't offer any data, so I will respond with some artificial data, constructed at the bottom of the post. You also don't say how much data you have although you say it is a large number of points. I am illustrating with 20000 points.

You used the group number as the plotting character to indicate the group. I find that hard to read. But just plotting the points doesn't show the groups well. Coloring each group a different color is a start, but does not look very good.

plot(x,y, pch=20, col=rainbow(3)[group])

First Attempt - color the groups

Two tricks that can make a lot of points more understandable are:
1. Make the points transparent. The dense places will appear darker. AND
2. Reduce the point size.

plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)

Transparent points of reduced size

That looks somewhat better, but did not address your actual request. Your sample picture seems to show confidence ellipses. You can get those using the function dataEllipse from the car package.

library(car)
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
dataEllipse(x,y,factor(group), levels=c(0.70,0.85,0.95),
    plot.points=FALSE, col=rainbow(3), group.labels=NA, center.pch=FALSE)

Plot with confidence ellipses

But if there are really a lot of points, the points can still overlap so much that they are just confusing. You can also use dataEllipse to create what is basically a 2D density plot without showing the points at all. Just plot several ellipses of different sizes over each other filling them with transparent colors. The center of the distribution will appear darker. This can give an idea of the distribution for a very large number of points.

plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
    plot.points=FALSE, col=rainbow(3), group.labels=NA, 
    center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)

Density plot

You can get a more continuous look by plotting more ellipses and leaving out the border lines.

plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=seq(0.11,0.99,0.02),
    plot.points=FALSE, col=rainbow(3), group.labels=NA, 
    center.pch=FALSE, fill=TRUE, fill.alpha=0.05, lty=0)

Smoother density plot

Please try different combinations of these to get a nice picture of your data.


Additional response to comment: Adding labels
Perhaps the most natural place to add group labels is the centers of the ellipses. You can get that by simply computing the centroids of the points in each group. So for example,

plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
        plot.points=FALSE, col=rainbow(3), group.labels=NA,
    center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)

## Now add labels
for(i in unique(group)) {
    text(mean(x[group==i]), mean(y[group==i]), labels=i) 
}

Labeled Ellipses

Note that I just used the number as the group label, but if you have a more elaborate name, you can change labels=i to something like labels=GroupNames[i].



Data

x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))

Upvotes: 3

user2554330
user2554330

Reputation: 44788

You can use hexbin::hexbin() to show very large datasets.

@G5W gave a nice dataset:

x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))

If you don't know the group information, then the ellipses are inappropriate; this is what I'd suggest:

library(hexbin)
plot(hexbin(x,y))

which produces

screenshot

If you really want contours, you'll need a density estimate to plot. The MASS::kde2d() function can produce one; see the examples in its help page for plotting a contour based on the result. This is what it gives for this dataset:

library(MASS)
contour(kde2d(x,y))

screenshot

Upvotes: 2

Related Questions