JiminP
JiminP

Reputation: 2142

How to convert RGB image to one-hot encoded 3d array based on color using numpy?

Simply put, what I'm trying to do is similar to this question: Convert RGB image to index image, but instead of 1-channel index image, I want to get n-channel image where img[h, w] is a one-hot encoded vector. For example, if the input image is [[[0, 0, 0], [255, 255, 255]], and index 0 is assigned to black and 1 is assigned to white, then the desired output is [[[1, 0], [0, 1]]].

Like the previous person asked the question, I have implemented this naively, but the code runs quite slowly, and I believe a proper solution using numpy would be significantly faster.

Also, as suggested in the previous post, I can preprocess each image into grayscale and one-hot encode the image, but I want a more general solution.

Example

Say I want to assign white to 0, red to 1, blue to 2, and yellow to 3:

(255, 255, 255): 0
(255, 0, 0): 1
(0, 0, 255): 2
(255, 255, 0): 3

, and I have an image which consists of those four colors, where image is a 3D array containing R, G, B values for each pixel:

[
    [[255, 255, 255], [255, 255, 255], [255,   0,   0], [255,   0,   0]],
    [[  0,   0, 255], [255, 255, 255], [255,   0,   0], [255,   0,   0]],
    [[  0,   0, 255], [  0,   0, 255], [255, 255, 255], [255, 255, 255]],
    [[255, 255, 255], [255, 255, 255], [255, 255,   0], [255, 255,   0]]
]

, and this is what I want to get where each pixel is changed to one-hot encoded values of index. (Since changing a 2d array of index values to 3d array of one-hot encoded values is easy, getting a 2d array of index values is fine too.)

[
    [[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0]],
    [[0, 0, 1, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0]],
    [[0, 0, 1, 0], [0, 0, 1, 0], [1, 0, 0, 0], [1, 0, 0, 0]],
    [[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]]
]

In this example I used colors where RGB components are either 255 or 0, but I don't want to solutions rely on that fact.

Upvotes: 5

Views: 7391

Answers (3)

Ivan
Ivan

Reputation: 40618

A simple implementation involves masking the relevant pixel positions, whether it's for converting from label to color or vice-versa. I show here how to convert between dense (1-channel labels), OHE (one-hot-encoding sparse), and RGB formats. Essentially performing OHE<->RGB<->dense.


Having defined your RGB-encoded input as rgb.

First define the color label to color mapping (no need for a dict here):

>>> colors = np.array([[ 255, 255, 255],
                       [ 255,   0,   0],
                       [   0,   0, 255],
                       [ 255, 255,   0]])

RGB (h, w, 3) to dense (h, w)

dense = np.zeros(seg.shape[:2])
for label, color in enumerate(colors):
    dense[np.all(seg == color, axis=-1)] = label

RGB (h, w, 3) to OHE (h, w, #classes)

Similar to the previous conversion, RGB to one-hot-encoding requires two additional lines:

ohe = np.zeros((*seg.shape[:2], len(colors)))
for label, color in enumerate(colors):
    v = np.zeros(len(colors))
    v[label] = 1
    ohe[np.all(seg == color, axis=-1)] = v

dense (h, w) to RGB (h, w, 3)

rgb = np.zeros((*labels.shape, 3))
for label, color in enumerate(colors):
    rgb[labels == label] = color

OHE (h, w, #classes) to RGB (h, w, 3)

Converting from OHE to dense requires one line:

dense = ohe.argmax(-1)

Then you can simply follow dense->RGB.

Upvotes: 0

MonsterMax
MonsterMax

Reputation: 35

My solution looks like this and should work for arbitrary colors:

color_dict = {0: (0,   255, 255),
              1: (255, 255,   0),
              ....}


def rgb_to_onehot(rgb_arr, color_dict):
    num_classes = len(color_dict)
    shape = rgb_arr.shape[:2]+(num_classes,)
    arr = np.zeros( shape, dtype=np.int8 )
    for i, cls in enumerate(color_dict):
        arr[:,:,i] = np.all(rgb_arr.reshape( (-1,3) ) == color_dict[i], axis=1).reshape(shape[:2])
    return arr


def onehot_to_rgb(onehot, color_dict):
    single_layer = np.argmax(onehot, axis=-1)
    output = np.zeros( onehot.shape[:2]+(3,) )
    for k in color_dict.keys():
        output[single_layer==k] = color_dict[k]
    return np.uint8(output)

I haven't tested it for speed yet, but at least, it works :)

Upvotes: 1

Divakar
Divakar

Reputation: 221514

We could generate the decimal equivalents of each pixel color. With each channel having 0 or 255 as the value, there would be total 8 possibilities, but it seems we are only interested in four of those colors.

Then, we would have two ways to solve it :

  • One would involve making unique indices from those decimal equivalents starting from 0 till the final color, all in sequence and finally initializing an output array and assigning into it.

  • Other way would be to use broadcasted comparisons of those decimal equivalents against the colors.

These two methods are listed next -

def indexing_based(a):
    b = (a == 255).dot([4,2,1])  # Decimal equivalents
    colors = np.array([7,4,1,6]) # Define colors decimal equivalents here
    idx = np.empty(colors.max()+1,dtype=int)
    idx[colors] = np.arange(len(colors))
    m,n,r = a.shape
    out = np.zeros((m,n,len(colors)), dtype=int)
    out[np.arange(m)[:,None], np.arange(n), idx[b]] = 1
    return out

def broadcasting_based(a):
    b = (a == 255).dot([4,2,1])  # Decimal equivalents
    colors = np.array([7,4,1,6]) # Define colors decimal equivalents here
    return (b[...,None] == colors).astype(int)

Sample run -

>>> a = np.array([
...     [[255, 255, 255], [255, 255, 255], [255,   0,   0], [255,   0,   0]],
...     [[  0,   0, 255], [255, 255, 255], [255,   0,   0], [255,   0,   0]],
...     [[  0,   0, 255], [  0,   0, 255], [255, 255, 255], [255, 255, 255]],
...     [[255, 255, 255], [255, 255, 255], [255, 255,   0], [255, 255,   0]],
...     [[255, 255, 255], [255,   0,   0], [255, 255,   0], [255,  0 ,   0]]])
>>> indexing_based(a)
array([[[1, 0, 0, 0],
        [1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 1, 0, 0]],

       [[0, 0, 1, 0],
        [1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 1, 0, 0]],

       [[0, 0, 1, 0],
        [0, 0, 1, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0]],

       [[1, 0, 0, 0],
        [1, 0, 0, 0],
        [0, 0, 0, 1],
        [0, 0, 0, 1]],

       [[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 0, 1],
        [0, 1, 0, 0]]])
>>> np.allclose(broadcasting_based(a), indexing_based(a))
True

Upvotes: 0

Related Questions