dimple
dimple

Reputation: 651

Given a 2D array of numbers, find clusters

Given a 2D array, for example:

0 0 0 0 0
0 2 3 0 1
0 8 5 0 7
7 0 0 0 4

Output should be groups of clusters:

Cluster 1: <2,3,8,5,7>

Cluster 2: <1,7,4>

Upvotes: 16

Views: 21910

Answers (6)

SyntaxNavigator
SyntaxNavigator

Reputation: 523

Using DFS:

import math
import numpy as np
def findConnectivityDFS(matrix):
    rows = matrix.shape[0]
    cols = matrix.shape[1]
    finalResult = {}
    # For 4 - connectivity use the following:
    dx = [+1, 0, -1, 0]
    dy = [0, +1, 0, -1]
    # For 8 - connectivity use the following:
    # dx = [-1, 0, 1,-1 ,1 ,-1,0,1]
    # dy = [-1,-1,-1,-0 ,0 ,1, 1,1]
    label = np.zeros((rows, cols))
    def dfs(x, y, current_label):
        # Taking care of boundaries and visited elements and zero elements in original matrix
        if x < 0 or x == rows or y < 0 or y == cols or label[x][y] != 0 or matrix[x][y] == 0:
            return
        label[x][y] = current_label
        if current_label not in finalResult.keys():
            finalResult[current_label] = []
        finalResult[current_label].append(matrix[x][y])
        for direction in range(len(dx)):
            dfs(x + dx[direction], y + dy[direction], current_label)
    component = 0
    for i in range(rows):
        for j in range(cols):
            if (label[i][j] == 0 and matrix[i][j] != 0):
                component = component + 1
                dfs(i, j, component)
    return finalResult

BFS solution:

    def findConnectivityBFS(matrix):
        dx = [+1, 0, -1, 0]
        dy = [0, +1, 0, -1]
        h, w = matrix.shape
        label = np.zeros((h, w))
        clusters = {}
        component = 1
        for i in range(h):
            for j in range(w):
                if matrix[i][j] != 0 and label[i][j] == 0:
                    cluster = []
                    queue = [(i, j, component)]
                    while queue:
                        x, y, component = queue.pop(0)
                        if 0 <= x < h and 0 <= y < w and label[x][y] == 0 and matrix[x][y] != 0:
                            label[x][y] = component
                            cluster.append(matrix[x][y])
                            for direction in range(len(dx)):
                                queue.append([x+dx[direction],   y+dy[direction], component])
                    clusters[component] = cluster
                    component += 1
        return clusters

Result:

matrix = np.array([
    [0 ,0, 0, 0,0],
    [0, 2, 3, 0, 1],
    [0, 8, 5, 0, 7],
    [7, 0, 0, 0, 4],
])
print(matrix)
print("============findConnectivityDFS============")
print(findConnectivityDFS(matrix))
print("============findConnectivityBFS============")
print(findConnectivityBFS(matrix))

Upvotes: 0

MechStar
MechStar

Reputation: 107

Code sample:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Practice
{
    class Program
    {
        static void Main()
        {
            var matrix = new[] { new StringBuilder("00000"), new StringBuilder("02301"), new StringBuilder("08507"), new StringBuilder("70004") };
            var clusters = 0;
            for (var i = 0; i < matrix.Length; i++)
                for (var j = 0; j < (matrix.Any() ? matrix[i].Length : 0); j++)
                    if (matrix[i][j] != '0')
                    {
                        Console.Write("Cluster {0}: <", ++clusters);
                        var cluster = new List<char>();
                        Traverse(matrix, i, j, ref cluster);
                        Console.WriteLine("{0}>", string.Join(",", cluster));
                    }
            Console.ReadKey();
        }

        private static void Traverse(StringBuilder[] matrix, int row, int col, ref List<char> cluster)
        {
            cluster.Add(matrix[row][col]);
            matrix[row][col] = '0';
            for (var i = -1; i <= 1 && row + i >= 0 && row + i < matrix.Length; i++)
                for (var j = -1; j <= 1 && col + j >= 0 && col + j < (matrix.Any() ? matrix[row + i].Length : 0); j++)
                    if(matrix[row + i][col + j] != '0')
                        Traverse(matrix, row + i, col + j, ref cluster);
        }
    }
}

Algorithm explanation:

  • For each row:
    • For each column in that row:
      • If the item is an unvisited non-zero:
        1. Save the found cluster member;
        2. Mark location at [row, column] as visited;
        3. Check for any unvisited non-zero neighbors while staying in-bounds of the matrix:
          • [row - 1, column - 1];
          • [row - 1, column];
          • [row - 1, column + 1];
          • [row, column - 1];
          • [row, column + 1];
          • [row + 1, column - 1];
          • [row + 1, column];
          • [row + 1, column + 1].
        4. If any neighbor is an unvisited non-zero repeat steps 1-4 recursively until all neighbors are visited zeros (all cluster members have been found).

Upvotes: 3

Ben Jackson
Ben Jackson

Reputation: 93700

You want to do Connected Component Labeling. This is usually considered an image processing algorithm, but it matches what you describe.

You will also get recommendations of K-means because you said 2D array of numbers and it is easy to interpret that as array of 2D numbers. K-means finds clusters of points in a plane, not connected groups in a 2D array like you request.

Upvotes: 3

nmjohn
nmjohn

Reputation: 1432

If you know the number of groups or want to fit your data to a static number of groups, you can do k-means.

http://en.wikipedia.org/wiki/K-means_clustering

Upvotes: 0

saeedn
saeedn

Reputation: 3378

Your problem seems to be finding connected components. You should use a traverse method (either BFS or DFS will do the work). Iterate over all elements, for each non-zero element start a traverse and record all elements you see in that traverse and turn each visited element into zero. Something like the code below:

void DFS(int x, int y)
{
  printf("%d ", g[x][y]);
  g[x][y] = 0;
  // iterate over neighbours
  for(dx=-1; dx<=1; dx++)
    for(dy=-1; dy<=1; dy++)
      if (g[x+dx][y+dy]) DFS(x+dx, y+dy);
}

for(i=0; i<n; i++)
  for(j=0; j<n; j++)
    if (g[i][j])
    {
      DFS(i, j);
      printf("\n");
    }

Upvotes: 6

user1118321
user1118321

Reputation: 26345

One way to do it is with a graph. Traverse the matrix in some order (I'd go left to right, top to bottom). When you encounter a non-zero element, add it to the graph. Then check all of its neighbors (it looks like you want 8-connected neighbors), and for each one that is non-zero, add its node to the graph, and connect it to the current element. The elements in the graph will probably have to keep track of their coordinates so you can see if you're adding a duplicate or not. When you're done traversing the matrix, you have a graph which contains a set of clusters. Clusters should be sub-graphs of connected elements.

Upvotes: 2

Related Questions