Reputation: 41
Your comments, suggestions, or solutions are/will be greatly appreciated, thank you.
I'm using the fpc
package in R to do a dbscan analysis of some very dense data (3 sets of 40,000 points between the range -3, 6).
I've found some clusters, and I need to graph just the significant ones. The problem is that I have a single cluster (the first) with about 39,000 points in it. I need to graph all other clusters but this one.
The dbscan()
creates a special data type to store all of this cluster data in. It's not indexed like a data frame would be (but maybe there is a way to represent it as such?).
I can graph the dbscan type using a basic plot()
call. But, like I said, this will graph the irrelevant 39,000 points.
tl;dr:
how do I graph only specific clusters of a dbscan
data type?
Upvotes: 4
Views: 8094
Reputation: 77495
The probably most sensible way of plotting DBSCAN
results is using alpha shapes, with the radius set to the epsilon value. Alpha shapes are closely related to convex hulls, but they are not necessarily convex. The alpha radius controls the amount of non-convexity allowed.
This is quite closely related to the DBSCAN
cluster model of density connected objects, and as such will give you a useful interpretation of the set.
As I'm not using R
, I don't know about the alpha shape capabilities of R
. There supposedly is a package called alphahull
, from a quick check on Google.
Upvotes: 0
Reputation: 173697
If you look at the help page (?dbscan
) it is organized like all others into sections labeled Description, Usage, Arguments, Details and Value. The Value section describes what the function dbscan
returns. In this case it is simply a list (a standard R data type) with a few components.
The cluster
component is simply an integer vector whose length it equal to the number of rows in your data that indicates which cluster each observation is a member of. So you can use this vector to subset your data to extract only those clusters you'd like and then plot just those data points.
For example, if we use the first example from the help page:
set.seed(665544)
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
sd=0.2))
ds <- dbscan(x, 0.2)
we can then use the result, ds
to plot only the points in clusters 1-3:
#Plot only clusters 1, 2 and 3
plot(x[ds$cluster %in% 1:3,])
Upvotes: 6
Reputation: 7023
Without knowing the specifics of dbscan
, I can recommend that you look at the function smoothScatter
. It it very useful for examining the main patterns in a scatterplot when you otherwise would have too many points to make sense of the data.
Upvotes: 1