Count of values in numpy.ndarray

Question

Is there any way to do the following in purely numpy (or opencv)?

img = cv2.imread("test.jpg")
counts = defaultdict(int)
for row in img:
    for val in row:
        counts[tuple(val)] += 1

The problem is that tuple(val) can obviously be one of 2^24 different values so having an array for every possible value is not possible since it'd be gigantic and mostly zeros, so I need a more efficient data structure.

HYRY · Accepted Answer

Here is my solution:

convert the image to an one-dim array with dtype=uint32
sort() the array
use diff() to find all the position that color changed.
use diff() again to find the count of every color.

the code:

In [50]:
from collections import defaultdict
import cv2
import numpy as np
img = cv2.imread("test.jpg")

In [51]:
%%time
counts = defaultdict(int)
for row in img:
    for val in row:
        counts[tuple(val)] += 1
Wall time: 1.29 s

In [53]:
%%time
img2 = np.concatenate((img, np.zeros_like(img[:, :, :1])), axis=2).view(np.uint32).ravel()
img2.sort()
pos = np.r_[0, np.where(np.diff(img2) != 0)[0] + 1]
count = np.r_[np.diff(pos), len(img2) - pos[-1]]
r, g, b, _ = img2[pos].view(np.uint8).reshape(-1, 4).T
colors = zip(r, g, b)
result = dict(zip(colors, count))
Wall time: 177 ms

In [49]:
counts == result
Out[49]:
True

If you can use pandas, you can call pandas.value_counts(), it's implemented in cython with hash table.

Count of values in numpy.ndarray

Answers (2)

Related Questions