Reputation: 3855
I have seen this question on StackOverflow, and I do want to solve the problem of clustering binary images such as this one from the same abovementioned question using sparse matrix representation:
I know that there are more simple and efficient methods for clustering (KMeans, Mean-shift, etc...), and I'm aware that this problem can be solved with other solutions such as connected components,
What I have tried so far is:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import cv2
import numpy as np
from math import sqrt
from scipy.sparse import csr_matrix
original = cv2.imread("km.png", 0)
img = original.copy()
def distance(p1, p2):
"""
Euclidean Distance
"""
return sqrt((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2)
k
clusters:data = np.nonzero(img)
data = tuple(zip(data[0],data[1]))
data = np.asarray(data).reshape(-1,2)
# print(data.shape)
l = data.shape[0] # 68245
# num clusters
k = 2
cov_mat = csr_matrix((l, l, k), dtype = np.int8).toarray()
# inefficient loop!
for i in range(l):
for j in range(l):
if(i!=j):
cov_mat[i, j, 0] = distance(data[i], data[j])
# Centroids
# TODO: check if more than two datapoints with same max distance then take first!
ci = cov_mat[...,0].argmax()
# inefficient loop!
# TODO: repeat for k clusters!
for i in range(l):
for j in range(l):
if(i!=j):
cov_mat[i, j, 0] = distance(data[i], data[ci[0]])
cov_mat[i, j, 1] = distance(data[i], data[ci[1]])
# Labeling
cov_mat[cov_mat[...,0]>cov_mat[...,1]] = +1
cov_mat[cov_mat[...,0]<cov_mat[...,1]] = -1
# Labeling Centroids
cov_mat[ci[0], ci[0]] = +1
cov_mat[ci[1], ci[1]] = -1
# clusters Indicies
cl1 = cov_mat[...,0].argmax()
cl2 = cov_mat[...,0].argmin()
# TODO: pass back the indices to the image to cluster it.
This approach is time costly, can you please tell me how to increase the efficiency please? thanks in advance.
Upvotes: 0
Views: 834
Reputation: 13723
As pointed out in the comments, when working with NumPy arrays, vectorised code should be preferred over scalar code (i.e., code with explicit for
loops). Besides, as I see it, in this particular problem a dense NumPy array is a better choice than a Scipy sparse array. If you are not constrained to utilize a sparse array, this code would do:
import numpy as np
from skimage.io import imread
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
n_clusters = 2
palette = [[255, 0, 0], [0, 255, 0]]
img = imread('https://i.sstatic.net/v1NDe.png')
rows, cols = img.nonzero()
coords = np.stack([rows, cols], axis=1)
labels = KMeans(n_clusters=n_clusters, random_state=0).fit_predict(coords)
result = np.zeros(shape=img.shape+(3,), dtype=np.uint8)
for label in range(n_clusters):
idx = (labels == label)
result[rows[idx], cols[idx]] = palette[label]
fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 8))
ax0.imshow(img, cmap='gray')
ax0.set_axis_off()
ax0.set_title('Binary image')
ax1.imshow(result)
ax1.set_axis_off()
ax1.set_title('Cluster labels');
Upvotes: 2