Reputation: 51118
I have the following problem - made abstract to bring out the key issues.
I have 10 points each which is some distance from the other. I want to
This is not euclidean space. But the distances can be summarised as follows - p(i) is point i:
p(1) p(2) p(3) p(4) p(5) p(6) p(7) p(8) p(9) p(10)
p(1) 0 2 1 3 2 3 3 2 3 4
p(2) 2 0 1 3 2 3 3 2 3 4
p(3) 1 1 0 2 0 1 2 1 2 3
p(4) 3 3 2 0 1 2 3 2 3 4
p(5) 2 2 1 1 0 1 2 1 2 3
p(6) 3 3 2 2 1 0 3 2 3 4
p(7) 3 3 2 3 2 3 0 1 2 3
p(8) 2 2 1 2 1 2 1 0 1 2
p(9) 3 3 2 3 2 3 2 1 0 1
p(10) 4 4 3 4 3 4 3 2 1 0
How would I calculate which is the center point of this cluster?
Upvotes: 8
Views: 22377
Reputation: 46
If you treat the points as a distribution, you can get the median/mean. Then, using the distance metric, you can get the the point/sample closest to the median/mean (it should have the least distance with the median/mean). This may give you more than 1 point/sample to be at the 'center'. Statistically, that may have a greater significance. Ultimately depends on what you are trying to do with it.
Upvotes: 0
Reputation: 2500
As far as I understand this looks like K Means Clustering, and what you are looking for is usually known as 'Medoids'.
See here: http://en.wikipedia.org/wiki/Medoids or here: http://en.wikipedia.org/wiki/K-medoids
Upvotes: 9
Reputation: 21663
I may be about to have that frisson that comes just before displaying utter stupidity. But doesn't this yield easily to brute force? In Python:
distances = [
[ 0 , 2 , 1 , 3 , 2 , 3 , 3 , 2 , 3 , 4 , ],
[ 2 , 0 , 1 , 3 , 2 , 3 , 3 , 2 , 3 , 4 , ],
[ 1 , 1 , 0 , 2 , 0 , 1 , 2 , 1 , 2 , 3 , ],
[ 3 , 3 , 2 , 0 , 1 , 2 , 3 , 2 , 3 , 4 , ],
[ 2 , 2 , 1 , 1 , 0 , 1 , 2 , 1 , 2 , 3 , ],
[ 3 , 3 , 2 , 2 , 1 , 0 , 3 , 2 , 3 , 4 , ],
[ 3 , 3 , 2 , 3 , 2 , 3 , 0 , 1 , 2 , 3 , ],
[ 2 , 2 , 1 , 2 , 1 , 2 , 1 , 0 , 1 , 2 , ],
[ 3 , 3 , 2 , 3 , 2 , 3 , 2 , 1 , 0 , 1 , ],
[ 4 , 4 , 3 , 4 , 3 , 4 , 3 , 2 , 1 , 0 , ],
]
currentMinimum = 99999
for point in range ( 10 ) :
distance_sum = 0
for second_point in range ( 10 ) :
if point == second_point : continue
distance_sum += distances [ point ] [ second_point ]
print '>>>>>', point, distance_sum
if distance_sum < currentMinimum :
currentMinimum = distance_sum
centre = point
print centre
Upvotes: 4
Reputation: 5673
What you're trying to do, or at least (b) belongs to Cluster Analysis. A branch of mathematics / statistics / econometrics where datapoints (e.g. points in n-dimensional space) are divided among groups or clusters. How to do this is not a trivial questions, there are many, many possible ways.
Read more at the wikipedia article on cluster analysis.
Upvotes: 1
Reputation: 7219
a)
b) no idea for now..
maybe for each p, find the closer machine.
by this logic make a graph.
than somehow (i dont know yet) divide the graph
Upvotes: 1