Reputation: 651
Given a 2D array, for example:
0 0 0 0 0
0 2 3 0 1
0 8 5 0 7
7 0 0 0 4
Output should be groups of clusters:
Cluster 1: <2,3,8,5,7>
Cluster 2: <1,7,4>
Upvotes: 16
Views: 21910
Reputation: 523
Using DFS:
import math
import numpy as np
def findConnectivityDFS(matrix):
rows = matrix.shape[0]
cols = matrix.shape[1]
finalResult = {}
# For 4 - connectivity use the following:
dx = [+1, 0, -1, 0]
dy = [0, +1, 0, -1]
# For 8 - connectivity use the following:
# dx = [-1, 0, 1,-1 ,1 ,-1,0,1]
# dy = [-1,-1,-1,-0 ,0 ,1, 1,1]
label = np.zeros((rows, cols))
def dfs(x, y, current_label):
# Taking care of boundaries and visited elements and zero elements in original matrix
if x < 0 or x == rows or y < 0 or y == cols or label[x][y] != 0 or matrix[x][y] == 0:
return
label[x][y] = current_label
if current_label not in finalResult.keys():
finalResult[current_label] = []
finalResult[current_label].append(matrix[x][y])
for direction in range(len(dx)):
dfs(x + dx[direction], y + dy[direction], current_label)
component = 0
for i in range(rows):
for j in range(cols):
if (label[i][j] == 0 and matrix[i][j] != 0):
component = component + 1
dfs(i, j, component)
return finalResult
BFS solution:
def findConnectivityBFS(matrix):
dx = [+1, 0, -1, 0]
dy = [0, +1, 0, -1]
h, w = matrix.shape
label = np.zeros((h, w))
clusters = {}
component = 1
for i in range(h):
for j in range(w):
if matrix[i][j] != 0 and label[i][j] == 0:
cluster = []
queue = [(i, j, component)]
while queue:
x, y, component = queue.pop(0)
if 0 <= x < h and 0 <= y < w and label[x][y] == 0 and matrix[x][y] != 0:
label[x][y] = component
cluster.append(matrix[x][y])
for direction in range(len(dx)):
queue.append([x+dx[direction], y+dy[direction], component])
clusters[component] = cluster
component += 1
return clusters
Result:
matrix = np.array([
[0 ,0, 0, 0,0],
[0, 2, 3, 0, 1],
[0, 8, 5, 0, 7],
[7, 0, 0, 0, 4],
])
print(matrix)
print("============findConnectivityDFS============")
print(findConnectivityDFS(matrix))
print("============findConnectivityBFS============")
print(findConnectivityBFS(matrix))
Upvotes: 0
Reputation: 107
Code sample:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Practice
{
class Program
{
static void Main()
{
var matrix = new[] { new StringBuilder("00000"), new StringBuilder("02301"), new StringBuilder("08507"), new StringBuilder("70004") };
var clusters = 0;
for (var i = 0; i < matrix.Length; i++)
for (var j = 0; j < (matrix.Any() ? matrix[i].Length : 0); j++)
if (matrix[i][j] != '0')
{
Console.Write("Cluster {0}: <", ++clusters);
var cluster = new List<char>();
Traverse(matrix, i, j, ref cluster);
Console.WriteLine("{0}>", string.Join(",", cluster));
}
Console.ReadKey();
}
private static void Traverse(StringBuilder[] matrix, int row, int col, ref List<char> cluster)
{
cluster.Add(matrix[row][col]);
matrix[row][col] = '0';
for (var i = -1; i <= 1 && row + i >= 0 && row + i < matrix.Length; i++)
for (var j = -1; j <= 1 && col + j >= 0 && col + j < (matrix.Any() ? matrix[row + i].Length : 0); j++)
if(matrix[row + i][col + j] != '0')
Traverse(matrix, row + i, col + j, ref cluster);
}
}
}
Algorithm explanation:
Upvotes: 3
Reputation: 93700
You want to do Connected Component Labeling. This is usually considered an image processing algorithm, but it matches what you describe.
You will also get recommendations of K-means because you said 2D array of numbers and it is easy to interpret that as array of 2D numbers. K-means finds clusters of points in a plane, not connected groups in a 2D array like you request.
Upvotes: 3
Reputation: 1432
If you know the number of groups or want to fit your data to a static number of groups, you can do k-means.
http://en.wikipedia.org/wiki/K-means_clustering
Upvotes: 0
Reputation: 3378
Your problem seems to be finding connected components. You should use a traverse method (either BFS or DFS will do the work). Iterate over all elements, for each non-zero element start a traverse and record all elements you see in that traverse and turn each visited element into zero. Something like the code below:
void DFS(int x, int y)
{
printf("%d ", g[x][y]);
g[x][y] = 0;
// iterate over neighbours
for(dx=-1; dx<=1; dx++)
for(dy=-1; dy<=1; dy++)
if (g[x+dx][y+dy]) DFS(x+dx, y+dy);
}
for(i=0; i<n; i++)
for(j=0; j<n; j++)
if (g[i][j])
{
DFS(i, j);
printf("\n");
}
Upvotes: 6
Reputation: 26345
One way to do it is with a graph. Traverse the matrix in some order (I'd go left to right, top to bottom). When you encounter a non-zero element, add it to the graph. Then check all of its neighbors (it looks like you want 8-connected neighbors), and for each one that is non-zero, add its node to the graph, and connect it to the current element. The elements in the graph will probably have to keep track of their coordinates so you can see if you're adding a duplicate or not. When you're done traversing the matrix, you have a graph which contains a set of clusters. Clusters should be sub-graphs of connected elements.
Upvotes: 2