droops
droops

Reputation: 41

Graphing results of dbscan in R

Your comments, suggestions, or solutions are/will be greatly appreciated, thank you.

I'm using the fpc package in R to do a dbscan analysis of some very dense data (3 sets of 40,000 points between the range -3, 6).

I've found some clusters, and I need to graph just the significant ones. The problem is that I have a single cluster (the first) with about 39,000 points in it. I need to graph all other clusters but this one.

The dbscan() creates a special data type to store all of this cluster data in. It's not indexed like a data frame would be (but maybe there is a way to represent it as such?).

I can graph the dbscan type using a basic plot() call. But, like I said, this will graph the irrelevant 39,000 points.

tl;dr: how do I graph only specific clusters of a dbscan data type?

Upvotes: 4

Views: 8094

Answers (3)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77495

The probably most sensible way of plotting DBSCAN results is using alpha shapes, with the radius set to the epsilon value. Alpha shapes are closely related to convex hulls, but they are not necessarily convex. The alpha radius controls the amount of non-convexity allowed.

This is quite closely related to the DBSCAN cluster model of density connected objects, and as such will give you a useful interpretation of the set.

As I'm not using R, I don't know about the alpha shape capabilities of R. There supposedly is a package called alphahull, from a quick check on Google.

Upvotes: 0

joran
joran

Reputation: 173697

If you look at the help page (?dbscan) it is organized like all others into sections labeled Description, Usage, Arguments, Details and Value. The Value section describes what the function dbscan returns. In this case it is simply a list (a standard R data type) with a few components.

The cluster component is simply an integer vector whose length it equal to the number of rows in your data that indicates which cluster each observation is a member of. So you can use this vector to subset your data to extract only those clusters you'd like and then plot just those data points.

For example, if we use the first example from the help page:

set.seed(665544)
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
    sd=0.2))
ds <- dbscan(x, 0.2)

we can then use the result, ds to plot only the points in clusters 1-3:

#Plot only clusters 1, 2 and 3
plot(x[ds$cluster %in% 1:3,])

Upvotes: 6

nullglob
nullglob

Reputation: 7023

Without knowing the specifics of dbscan, I can recommend that you look at the function smoothScatter. It it very useful for examining the main patterns in a scatterplot when you otherwise would have too many points to make sense of the data.

Upvotes: 1

Related Questions