Dr.Knowitall
Dr.Knowitall

Reputation: 10498

Numpy image slicing returning black patches/ wrong values

The end goal is to take an image and slice it up into samples that I save. The problem is that my slices are randomly returning black/ incorrect patches. Bellow is a small sample program.

import scipy.ndimage as ndimage
import scipy.misc as misc
import numpy as np

image32 = misc.imread("work0.png")
patches = np.zeros((36, 8, 8))
for i in range(4):
  for j in range(4):
    patches[i*4 + j] = image32[i:i+8,j:j+8]
    misc.imsave("{0}{1}.png".format(i,j), patches[i*4 + j])

An example of my image would be:

Letter that I'm getting patches of

Patch of 0,0 of 8x8 patch yields:

enter image description here

Upvotes: 4

Views: 1726

Answers (2)

rayryeng
rayryeng

Reputation: 104535

Two things:

  1. You are initializing your patch matrix to be the wrong data type. By default, numpy will make patches matrix a np.float64 type and if you use this with saving, you won't get the results you would expect. Specifically, if you consult Mr. F's answer, there is actually some scaling performed on floating-point images where the minimum and maximum values of the image get scaled to black and white respectively and so if you have an image that is completely uniform in background, both the minimum and maximum will be the same and will get visualized to black. As such, the best thing is to respect the original image's data type, namely setting the dtype of your patches matrix to np.uint8.

  2. Judging from your for loop indexing, you want to extract out 8 x 8 patches that are non-overlapping. This means that if you have a 32 x 32 image with 8 x 8 patches, you have 16 patches in total arranged in a 4 x 4 grid.

Therefore, you need to change the patches statement so that it has 16 in the first dimension, not 36. In addition, you'll have to adjust the way you're indexing into your image to extract out the 8 x 8 patches because right now, the patches are overlapping. Specifically, you want to make the image patch indexing go from 8*i to 8*(i+1) for the rows and 8*j to 8*(j+1) for the columns. If you substitute sample values of i and j yourself, you'll see that we get unique 8 x 8 patches for each grid in your image.


With both of the above things I noted, the modified code should be:

import scipy.ndimage as ndimage
import scipy.misc as misc
import numpy as np

image32 = misc.imread('work0.png')

patches = np.zeros((16,8,8), dtype=np.uint8) # Change

for i in range(4):
    for j in range(4):
        patches[i*4 + j] = image32[8*i:8*(i+1),8*j:8*(j+1)] # Change
        misc.imsave("{0}{1}.png".format(i,j), patches[i*4 + j])

When I do this and take a look at the output images, I get what I expect.


To be absolutely sure, let's plot the segments using matplotlib. You've conveniently saved all of the patches in patches so it shouldn't be a problem showing what we need. However, I'll place some code in comments so that you can read in the images that were saved from disk with your above code so you can verify that it still works, regardless of looking at patches or the images on disk:

import matplotlib.pyplot as plt

plt.figure()
for i in range(4):
    for j in range(4):
        plt.subplot(4, 4, 4*i + j + 1)
        img = patches[4*i + j]
        # or you can do this:
        # img = misc.imread('{0}{1}.png'.format(i,j))
        img = np.dstack([img, img, img])
        plt.imshow(img)

plt.show()

The weird thing about matplotlib.pyplot.imshow is that if you have an image that is single channel (such as your case) that has the same intensity all around, it gets visualized to black no matter what the colour map is, much like what we experienced with imsave. Therefore, I had to artificially make this a RGB image but with all of the channels to be the same so this gets visualized as grayscale before we show the image.

We get:

enter image description here

Upvotes: 7

ely
ely

Reputation: 77464

According to this answer the issue is that imsave normalizes the data so that the computed minimum is defined as black (and, if there is a distinct maximum, that is defined as white).

This led me to go digging as to why the suggested use of uint8 did work to create the desired output. As it turns out, in the source there is a function called bytescale that gets called internally.

Actually, imsave itself is a very thin wrapper around toimage followed by save (from the image object). Inside of toimage if mode is None (which it is by default), that's when bytescale gets invoked.

It turns out that bytescale has an if statement that checks for the uint8 data type, and if the data is in that format, it returns the data unaltered. But if not, then the data is scaled according to a max and min transformation (where 0 and 255 are the default low and high pixel values to compare to).

This is the full snippet of code linked above:

if data.dtype == uint8:
    return data

if high < low:
    raise ValueError("`high` should be larger than `low`.")

if cmin is None:
    cmin = data.min()
if cmax is None:
    cmax = data.max()

cscale = cmax - cmin
if cscale < 0:
    raise ValueError("`cmax` should be larger than `cmin`.")
elif cscale == 0:
    cscale = 1

scale = float(high - low) / cscale
bytedata = (data * 1.0 - cmin) * scale + 0.4999
bytedata[bytedata > high] = high
bytedata[bytedata < 0] = 0
return cast[uint8](bytedata) + cast[uint8](low)

For the blocks of your data that are all 255, cscale will be 0, which will be checked for and changed to 1. Then the line

bytedata = (data * 1.0 - cmin) * scale + 0.4999

will result in the whole image block having the float value of 0.4999, thus set explicitly to 0 in the next chunk of code (when casted to uint8 from float) as for example:

In [102]: np.cast[np.uint8](0.4999)
Out[102]: array(0, dtype=uint8)

You can see in the body of bytescale that there are only two possible ways to return: either your data is type uint8 and it's returned as-is, or else it goes through this kind of silly scaling process. So in the end, it is indeed correct, and good practice, to be using uint8 for the pieces of your code that specifically load from or save to an image format via these functions.

So this cascade of stuff is why you were getting all zeros in the outputted image file and why the other suggestion of using dtype=np.uint8 actually helps you. It's not because you need to avoid floating point data for images, just because of this bizarre convention to check and scale data on the part of imsave.

Upvotes: 4

Related Questions