Reputation: 30225
Is there any way to do the following in purely numpy (or opencv)?
img = cv2.imread("test.jpg")
counts = defaultdict(int)
for row in img:
for val in row:
counts[tuple(val)] += 1
The problem is that tuple(val)
can obviously be one of 2^24 different values so having an array for every possible value is not possible since it'd be gigantic and mostly zeros, so I need a more efficient data structure.
Upvotes: 2
Views: 2961
Reputation: 67457
The fastest way around this, if the image is stored in "chunky" format, i.e. the color planes dimension is the last, and this last dimension is contiguous, is to take a np.void
view of every 24bits pixel, then run the result through np.unique
and np.bincount
:
>>> arr = np.random.randint(256, size=(10, 10, 3)).astype(np.uint8)
>>> dt = np.dtype((np.void, arr.shape[-1]*arr.dtype.itemsize))
>>> if arr.strides[-1] != arr.dtype.itemsize:
... arr = np.ascontiguousarray(arr)
...
>>> arr_view = arr.view(dt)
The contents of arr_view
look like garbage:
>>> arr_view [0, 0]
array([Â],
dtype='|V3')
But it's not us that have to understand the content:
>>> unq, _ = np.unique(arr_view, return_inverse=True)
>>> unq_cnts = np.bincount(_)
>>> unq = unq.view(arr.dtype).reshape(-1, arr.shape[-1])
And now you have the unique pixels and their counts in those two arrays:
>>> unq[:5]
array([[ 0, 82, 78],
[ 6, 221, 188],
[ 9, 209, 85],
[ 14, 210, 24],
[ 14, 254, 88]], dtype=uint8)
>>> unq_cnts[:5]
array([1, 1, 1, 1, 1], dtype=int64)
Upvotes: 4
Reputation: 97331
Here is my solution:
sort()
the arraydiff()
to find all the position that color changed.diff()
again to find the count of every color.the code:
In [50]:
from collections import defaultdict
import cv2
import numpy as np
img = cv2.imread("test.jpg")
In [51]:
%%time
counts = defaultdict(int)
for row in img:
for val in row:
counts[tuple(val)] += 1
Wall time: 1.29 s
In [53]:
%%time
img2 = np.concatenate((img, np.zeros_like(img[:, :, :1])), axis=2).view(np.uint32).ravel()
img2.sort()
pos = np.r_[0, np.where(np.diff(img2) != 0)[0] + 1]
count = np.r_[np.diff(pos), len(img2) - pos[-1]]
r, g, b, _ = img2[pos].view(np.uint8).reshape(-1, 4).T
colors = zip(r, g, b)
result = dict(zip(colors, count))
Wall time: 177 ms
In [49]:
counts == result
Out[49]:
True
If you can use pandas, you can call pandas.value_counts()
, it's implemented in cython
with hash table.
Upvotes: 2