jocull
jocull

Reputation: 21095

Grouping numeric list into fixed sets of like numbers (with circular shifting preferred)

This is probably more of a math question, but it will end up as code eventually, so here goes...

I have an arbitrary set of numbers that need to be grouped by nearest similarity into a fixed number of groups, as entered by the user. Both the length of the set and the value of the numbers can vary.

Here's a sample set...

1.2
1.3
0.5
0.7
1.3
1.4
0.7
0.9
1.1
1.3

The order of these items must be preserved - the final set cannot be reordered, however it can be shifted. For example, the last numbers could become the first numbers. This set is being applied to a circle, so anything that preserves the circular integrity of it is fine.

So if a user were to enter 4 as the desired number of groups, I would expect the output to be as follows:

0 =>
    1.2
    1.3
1 =>
    0.5
    0.7
2 =>
    1.3
    1.4
    0.7
3 =>
    0.9
    1.1
    1.3

In an even better case, the numbers could be shifted to improve their likeness in the groups. For example, by shifting the two last numbers to the beginning...

0 =>
    1.1
    1.3
    1.2
    1.3
1 =>
    0.5
    0.7
2 =>
    1.3
    1.4
3 =>
    0.7
    0.9

Is there an algorithm that would be useful for this? Any pointers on how to pull something like this off?

Thanks!

Upvotes: 2

Views: 153

Answers (1)

nicholas
nicholas

Reputation: 3047

What you want to accomplish is a special case of Single-Linkage clustering, where your domain is linear and circular. Instead of a distance matrix, you will have a distance array. Presumably, your distance function is the absolute value of the difference of the numbers in the array.

This code is by no means the most concise, well-written C#; but it does highlight all of the important steps of the clustering algorithm:

Simple Distance Calculation

class Node
{
    public double Value { get; set; }
    public Node NextNode { get; set; }
    public Cluster Cluster { get; set; }
}

class Cluster : List<Node>
{

}

static void Main()
{
    double[] values = new double[]
    {
        1.2,
        1.3,
        0.5,
        0.7,
        1.3,
        1.4,
        0.7,
        0.9,
        1.1,
        1.3,
    };

    List<Node> nodes = new List<Node>();

    foreach (double value in values)
    {
        nodes.Add(new Node { Value = value });
    }

    // Put each node in a cluster by itself
    foreach (Node node in nodes)
    {
        node.Cluster = new Cluster();
        node.Cluster.Add(node);
    }

    // Create the cirular Linked List here
    // could probably use System.Collections in some way
    // but using simple self written classes for clarity
    for (int n = 1; n < nodes.Count; n++)
    {
        nodes[n - 1].NextNode = nodes[n];
    }
    nodes[nodes.Count - 1].NextNode = nodes[0];

    // Create a sorted distance list
    List<Node> sortedNodes = new List<Node>(nodes);
    sortedNodes.Sort((a, b) =>
        {
            var aDistToNext = Math.Abs(a.Value - a.NextNode.Value);
            var bDistToNext = Math.Abs(b.Value - b.NextNode.Value);

            return aDistToNext.CompareTo(bDistToNext);
        });

    // Register each node / cluster to the output list
    List<Cluster> clusters = new List<Cluster>();
    foreach (Node node in nodes)
    {
        clusters.Add(node.Cluster);
    }


    // Merge clusters until the desired number is reached
    int distIdx = 0;

    while (clusters.Count > 4)
    {
        // Obtain the two next closest nodes
        var nodeA = sortedNodes[distIdx];
        var nodeB = nodeA.NextNode;

        // Merge the nodes into a single cluster
        nodeA.Cluster.AddRange(nodeB.Cluster);

        // Remove the unnecessary cluster from output set
        clusters.Remove(nodeB.Cluster);
        nodeB.Cluster = nodeA.Cluster;

        distIdx++;
    }

    // Print the output results
    for (int n = 0; n < clusters.Count; n++)
    {
        Console.WriteLine("{0} =>", n);

        foreach (Node node in clusters[n])
        {
            Console.WriteLine("\t{0}", node.Value);
        }
    }
}

Now note, that the algorithm works as intended; however, it gives different results than posted in the original question. The difference is due to the ambiguous linking when inter-node distances are equal.

Output:

0 =>
        0.5
        0.7
1 =>
        1.3
        1.4
2 =>
        0.7
3 =>
        0.9
        1.1
        1.3
        1.2
        1.3

Advanced Distance Calculation

If you wanted to prioritize nodes of similar distance joining before connecting to nodes which had already been joined in a previous cluster you could modify the sort algorithm to give priority to unattached nodes:

sortedNodes.Sort((a, b) =>
    {
        var aDistToNext = Math.Abs(a.Value - a.NextNode.Value);
        var bDistToNext = Math.Abs(b.Value - b.NextNode.Value);

        var result = aDistToNext.CompareTo(bDistToNext);

        if (result != 0)
            return result;
        else
        {
            var aNextDistToNext = Math.Abs(a.NextNode.Value - a.NextNode.NextNode.Value);
            var bNextDistToNext = Math.Abs(b.NextNode.Value - b.NextNode.NextNode.Value);

            return bNextDistToNext.CompareTo(aNextDistToNext);
        }
    });

Which gives the desired results:

0 =>
        0.5
        0.7
1 =>
        1.3
        1.4
2 =>
        0.7
        0.9
3 =>
        1.1
        1.3
        1.2
        1.3

Upvotes: 1

Related Questions