Reputation: 10379
I have an OpenCV image, as usual in BGR color space, and I need to convert it to CMYK. I searched online but found basically only (slight variations of) the following approach:
def bgr2cmyk(cv2_bgr_image):
bgrdash = cv2_bgr_image.astype(float) / 255.0
# Calculate K as (1 - whatever is biggest out of Rdash, Gdash, Bdash)
K = 1 - numpy.max(bgrdash, axis=2)
with numpy.errstate(divide="ignore", invalid="ignore"):
# Calculate C
C = (1 - bgrdash[..., 2] - K) / (1 - K)
C = 255 * C
C = C.astype(numpy.uint8)
# Calculate M
M = (1 - bgrdash[..., 1] - K) / (1 - K)
M = 255 * M
M = M.astype(numpy.uint8)
# Calculate Y
Y = (1 - bgrdash[..., 0] - K) / (1 - K)
Y = 255 * Y
Y = Y.astype(numpy.uint8)
return (C, M, Y, K)
This works fine, however, it feels quite slow - for an 800 x 600 px image it takes about 30 ms on my i7 CPU. Typical operations with cv2
like thresholding and alike take only a few ms for the same image, so since this is all numpy
I was expecting this CMYK conversion to be faster.
However, I haven't found anything that makes this significantly fater. There is a conversion to CMYK via PIL.Image
, but the resulting channels do not look as they do with the algorithm listed above.
Any other ideas?
Upvotes: 5
Views: 684
Reputation: 15364
There are several things you should do:
Given
RGB' = RGB / 255
K = 1 - max(RGB')
C = (1-K - R') / (1-K)
M = (1-K - G') / (1-K)
Y = (1-K - B') / (1-K)
You see what you can factor out.
RGB' = RGB / 255
J = max(RGB')
K = 1 - J
C = (J - R') / J
M = (J - G') / J
Y = (J - B') / J
Don't normalize to [0,1]
for these calculations. The max()
can be done on integers. The differences can too. K
can be calculated entirely with integer math.
J = max(RGB)
K = 255 - J
C = 255 * (J - R) / J
M = 255 * (J - G) / J
Y = 255 * (J - B) / J
import numba
Numba will optimize that code beyond simply using numpy library routines. It will also do the parallelization as indicated. Choosing the numpy
error model and allowing fastmath
will cause division by zero to not throw an exception or warning, but also make the math a little faster.
Both variants significantly outperform a plain python/numpy solution. Much of that is due to better use of CPU registers caches, rather than intermediate arrays, as is usual with numpy.
First variant: ~1.9 ms
@numba.njit(parallel=True, error_model="numpy", fastmath=True)
def bgr2cmyk_v4(bgr_img):
bgr_img = np.ascontiguousarray(bgr_img)
(height, width) = bgr_img.shape[:2]
CMYK = np.empty((height, width, 4), dtype=np.uint8)
for i in numba.prange(height):
for j in range(width):
B,G,R = bgr_img[i,j]
J = max(R, G, B)
K = np.uint8(255 - J)
C = np.uint8(255 * (J - R) / J)
M = np.uint8(255 * (J - G) / J)
Y = np.uint8(255 * (J - B) / J)
CMYK[i,j] = (C,M,Y,K)
return CMYK
Thanks to Cris Luengo for pointing out further refactoring potential (pulling out 255/J
), leading to a second variant. It takes ~1.6 ms
@numba.njit(parallel=True, error_model="numpy", fastmath=True)
def bgr2cmyk_v5(bgr_img):
bgr_img = np.ascontiguousarray(bgr_img)
(height, width) = bgr_img.shape[:2]
CMYK = np.empty((height, width, 4), dtype=np.uint8)
for i in numba.prange(height):
for j in range(width):
B,G,R = bgr_img[i,j]
J = np.uint8(max(R, G, B))
Jinv = np.uint16((255*256) // J) # fixed point math
K = np.uint8(255 - J)
C = np.uint8(((J - R) * Jinv) >> 8)
M = np.uint8(((J - G) * Jinv) >> 8)
Y = np.uint8(((J - B) * Jinv) >> 8)
CMYK[i,j] = (C,M,Y,K)
return CMYK
This fixed point math causes floor rounding. For round-to-nearest, the expression must be ((J - R) * Jinv + 128) >> 8
. That would cost a bit more time then (~1.8 ms).
I think that numba/LLVM didn't apply SIMD here. Some investigation revealed that the Loop Vectorizer doesn't like any of the instances it was asked to consider.
An OpenCL kernel might be even faster. OpenCL can run on CPUs.
Upvotes: 3
Reputation: 10151
I would start by profiling which part is the bottleneck.
e.g how fast is it without the / (1 - K)
calculation?
-> precalculate 1/(1-K)
might help. Even precalculation of 255/(1-K)
is possible.
K = 1 - numpy.max(bgrdash, axis=2)
kRez255=255/(1 - K)
with numpy.errstate(divide="ignore", invalid="ignore"):
# Calculate C
C = (1 - bgrdash[..., 2] - K) * kRez255
C = C.astype(numpy.uint8)
# Calculate M
M = (1 - bgrdash[..., 1] - K) * kRez255
M = M.astype(numpy.uint8)
# Calculate Y
Y = (1 - bgrdash[..., 0] - K) * kRez255
Y = Y.astype(numpy.uint8)
return (C, M, Y, K)
But only profiling can show if it is the calculation at all which slows down the conversion.
Upvotes: 3