Reputation: 21095
This is probably more of a math question, but it will end up as code eventually, so here goes...
I have an arbitrary set of numbers that need to be grouped by nearest similarity into a fixed number of groups, as entered by the user. Both the length of the set and the value of the numbers can vary.
Here's a sample set...
1.2
1.3
0.5
0.7
1.3
1.4
0.7
0.9
1.1
1.3
The order of these items must be preserved - the final set cannot be reordered, however it can be shifted. For example, the last numbers could become the first numbers. This set is being applied to a circle, so anything that preserves the circular integrity of it is fine.
So if a user were to enter 4
as the desired number of groups, I would expect the output to be as follows:
0 =>
1.2
1.3
1 =>
0.5
0.7
2 =>
1.3
1.4
0.7
3 =>
0.9
1.1
1.3
In an even better case, the numbers could be shifted to improve their likeness in the groups. For example, by shifting the two last numbers to the beginning...
0 =>
1.1
1.3
1.2
1.3
1 =>
0.5
0.7
2 =>
1.3
1.4
3 =>
0.7
0.9
Is there an algorithm that would be useful for this? Any pointers on how to pull something like this off?
Thanks!
Upvotes: 2
Views: 153
Reputation: 3047
What you want to accomplish is a special case of Single-Linkage clustering, where your domain is linear and circular. Instead of a distance matrix, you will have a distance array. Presumably, your distance function is the absolute value of the difference of the numbers in the array.
This code is by no means the most concise, well-written C#; but it does highlight all of the important steps of the clustering algorithm:
Simple Distance Calculation
class Node
{
public double Value { get; set; }
public Node NextNode { get; set; }
public Cluster Cluster { get; set; }
}
class Cluster : List<Node>
{
}
static void Main()
{
double[] values = new double[]
{
1.2,
1.3,
0.5,
0.7,
1.3,
1.4,
0.7,
0.9,
1.1,
1.3,
};
List<Node> nodes = new List<Node>();
foreach (double value in values)
{
nodes.Add(new Node { Value = value });
}
// Put each node in a cluster by itself
foreach (Node node in nodes)
{
node.Cluster = new Cluster();
node.Cluster.Add(node);
}
// Create the cirular Linked List here
// could probably use System.Collections in some way
// but using simple self written classes for clarity
for (int n = 1; n < nodes.Count; n++)
{
nodes[n - 1].NextNode = nodes[n];
}
nodes[nodes.Count - 1].NextNode = nodes[0];
// Create a sorted distance list
List<Node> sortedNodes = new List<Node>(nodes);
sortedNodes.Sort((a, b) =>
{
var aDistToNext = Math.Abs(a.Value - a.NextNode.Value);
var bDistToNext = Math.Abs(b.Value - b.NextNode.Value);
return aDistToNext.CompareTo(bDistToNext);
});
// Register each node / cluster to the output list
List<Cluster> clusters = new List<Cluster>();
foreach (Node node in nodes)
{
clusters.Add(node.Cluster);
}
// Merge clusters until the desired number is reached
int distIdx = 0;
while (clusters.Count > 4)
{
// Obtain the two next closest nodes
var nodeA = sortedNodes[distIdx];
var nodeB = nodeA.NextNode;
// Merge the nodes into a single cluster
nodeA.Cluster.AddRange(nodeB.Cluster);
// Remove the unnecessary cluster from output set
clusters.Remove(nodeB.Cluster);
nodeB.Cluster = nodeA.Cluster;
distIdx++;
}
// Print the output results
for (int n = 0; n < clusters.Count; n++)
{
Console.WriteLine("{0} =>", n);
foreach (Node node in clusters[n])
{
Console.WriteLine("\t{0}", node.Value);
}
}
}
Now note, that the algorithm works as intended; however, it gives different results than posted in the original question. The difference is due to the ambiguous linking when inter-node distances are equal.
Output:
0 =>
0.5
0.7
1 =>
1.3
1.4
2 =>
0.7
3 =>
0.9
1.1
1.3
1.2
1.3
Advanced Distance Calculation
If you wanted to prioritize nodes of similar distance joining before connecting to nodes which had already been joined in a previous cluster you could modify the sort algorithm to give priority to unattached nodes:
sortedNodes.Sort((a, b) =>
{
var aDistToNext = Math.Abs(a.Value - a.NextNode.Value);
var bDistToNext = Math.Abs(b.Value - b.NextNode.Value);
var result = aDistToNext.CompareTo(bDistToNext);
if (result != 0)
return result;
else
{
var aNextDistToNext = Math.Abs(a.NextNode.Value - a.NextNode.NextNode.Value);
var bNextDistToNext = Math.Abs(b.NextNode.Value - b.NextNode.NextNode.Value);
return bNextDistToNext.CompareTo(aNextDistToNext);
}
});
Which gives the desired results:
0 =>
0.5
0.7
1 =>
1.3
1.4
2 =>
0.7
0.9
3 =>
1.1
1.3
1.2
1.3
Upvotes: 1