Janak
Janak

Reputation: 683

Clustering connected set of points (longitude,latitude) using R

I am working with centered longitude (x) and latitude(y) data. My goal is to clustering the connected locations.

Two location on earth (x1,y1) and (x2,y2) are said to be connected if earth_distance((x1,y1),(x2,y2))<15 kilometer.

I am using the distHaversine function in R, to calculate earth distance.

Here is some sample data,

x=c(1.000000, 1.055672, 1.038712, 1.094459, 1.133179, 1.116241, 1.126053, 1.181824 ,1.377892, 5.869881, 5.925270, 5.909721)

and

y=c(1.333368,1.304790,1.347332,1.318743,1.332676,1.375229,1.572287,1.544174,2.371105,2.337032,2.383415)

also

distance <- distHaversine(c(x,y))

I wish find the different clusters formed by the different connected set of points (each connected set of points form a cluster).

I looked at How to cluster points and plot but I could not solved my problem.

Any reference, suggestion or answer will be very much appreciated.

Upvotes: 0

Views: 1182

Answers (1)

Spacedman
Spacedman

Reputation: 94182

Maybe this. First make some coordinates:

> x=c(1.000000, 1.055672, 1.038712, 1.094459, 1.133179, 1.116241, 1.126053, 1.181824 ,1.377892, 5.869881, 5.925270)
> y=c(1.333368, 1.304790, 1.347332, 1.318743, 1.332676, 1.375229, 1.572287, 1.544174, 2.371105 ,2.337032, 2.383415)

Make into a data frame

> xy = data.frame(x=x,y=y)

Now use outer to loop over all pairs of rows and columns to compute a full distance matrix. This does twice as much work as is really necessary since it computes i to j and j to i for all i and j. Anyway, it gets us a distance matrix:

> dmat = outer(1:nrow(xy), 1:nrow(xy), function(i,j)distHaversine(xy[i,],xy[j,]))

Now we want a connectivity matrix, which is any pair closer than 15,000 metres:

> cmat = dmat < 15000

Now we use the igraph package to build a connectivity graph object:

> require(igraph)
> cgraph = graph.adjacency(cmat)

You can plot this to see the cluster formation, but note these are not plotted in your x-y space:

> plot(cgraph)

Now to get the connected clusters:

> clusters(cgraph)
$membership
 [1] 1 1 1 1 1 1 2 2 3 4 4

$csize
[1] 6 2 1 2

$no
[1] 4

Which you can add to your data frame thus:

> xy$cluster = clusters(cgraph)$membership
> xy
          x        y cluster
1  1.000000 1.333368       1
2  1.055672 1.304790       1
3  1.038712 1.347332       1
4  1.094459 1.318743       1
5  1.133179 1.332676       1
6  1.116241 1.375229       1
7  1.126053 1.572287       2
8  1.181824 1.544174       2
9  1.377892 2.371105       3
10 5.869881 2.337032       4
11 5.925270 2.383415       4

And plot:

> plot(xy$x,xy$y,col=xy$cluster)

Upvotes: 2

Related Questions