Alexander Soare
Alexander Soare

Reputation: 3257

Any much faster methods for clustering than K means?

I have handwritten digit in a box and I'm trying to just pull the handwritten digit out. The size is 208 x 117, so that's about 24k pixels.

enter image description here

I want to take advantage of the fact that I have color, so I decided to use a clustering algorithm to isolate the color of the digit, then extract just those pixels. The problem is that I need to get this down to 0.01s per digit, and now sklearn.cluster.KMeans takes about 0.15s. I tried resizing the image, but that takes time in itself, and I also tried using a threshold to just get the colored pixels and ignore the light background (gets me down to 10k pixels), but that didn't speed things up much.

Any ideas?

Upvotes: 0

Views: 478

Answers (1)

Alexander Soare
Alexander Soare

Reputation: 3257

I found a way. Turns out you get a massive speedup by reducing sample size. So I just randomly sampled a quarter of the pixels and fed that into the clustering function. I got a 50x speedup.

Upvotes: 2

Related Questions