Reputation: 56189

Cluster numbers in big matrix

I need to cluster all numbers (in one cluster can be numbers with same value, for example just 5) in matrix different than passed and return dictionary like

    {number1:[[(3,4),(4,5)],[]..], number2:...}#I am using Python

I can iterate through rows and columns and when I find number different than passed number x I start floodfill and create cluster and remember visited positions, to avoid double same clusters, and it works. I wonder does anyone has better idea, quicker ?

FOR EXAMPLE ( I want to cluster all numbers different than 1) passed value 1

2  1  1  2
2  1  2  2 
1  1  3  3

I would get { 2:[(0,0),(1,0)],[(0,3),(1,2)(1,3)], 3:[(2,2),(2,3)]}

Upvotes: 1

Answers (2)

siddharthlatest

Reputation: 2257

Let's classify the cell values into target value and replacement value(s), where the cells with target value are the ones you want to modify. You want to cluster the ones with replacement value(s). In your example, these values happen to be 1, and (2,3) respectively.

One usual application of flood-fill is to well, change cells with replacement value(s) to ones with target value, e.g. bucket fill tool in paint applications. If this is your use-case, you can simply change the cell values each time you visit the cell, thus removing the need to remember if you visited it previously. I am assuming that's not your use-case.

Method #1: Using dictionary

I would use a dictionary with (row, col) of the visited cell as key. Since you want to see if you have visited a (row, col), you can do it in O(1) time-complexity. Your method will need to first go to the particular replacement value key, and then iterate through the list to find if the (current row, current col) is present in it. It's time-complexity is proportional to O(k), where k is the number of elements in the list. In the worst case, it will be O(RxC), where RxC is the dimension of the matrix.

Method #2: Using a bool matrix

Another approach which is simple is to have a matrix of type bool with same dimensions as the cell value matrix. Each time you visit a cell, mark it as True. You can check if a cell is already visited then in O(1).

In the worst case, both the above data structures will have a space-complexity of O(RxC). I am assuming this is fine, since you already have a matrix of the same order for cell values.

Upvotes: 1

Adam Cadien

Reputation: 1167

Have you considered using numpy.where()? If you can get your data into an array you can simply use:

 import numpy as np
 data=np.array(<some data>)
 np.where(data!=2)

to grab the indeces of the elements not equal to two. For that matter numpy array's work with direct comparisons such that

 data > 5

will return a boolean array where the comparison is true, and:

 data[data > 5]

will return the values where comparison is true such that you probably won't even need numpy.where().

Upvotes: 1

Cluster numbers in big matrix

Answers (2)

Related Questions