Ziyad Edher
Ziyad Edher

Reputation: 2150

Why does NumPy's random function seemingly display a pattern in its generated values?

I was playing around with NumPy and Pillow and came across an interesting result that apparently showcases a pattern in NumPy random.random() results.

Image One Image Two Image Three Image Four

Here a sample of the full code for generating and saving 100 of these images (with seed 0), the above are the first four images generated by this code.

import numpy as np
from PIL import Image

np.random.seed(0)
img_arrays = np.random.random((100, 256, 256, 3)) * 255
for i, img_array in enumerate(img_arrays):
    img = Image.fromarray(img_array, "RGB")
    img.save("{}.png".format(i))

The above are four different images created using PIL.Image.fromarray() on four different NumPy arrays created using numpy.random.random((256, 256, 3)) * 255 to generate a 256 by 256 grid of RGB values in four different Python instances (the same thing also happens in the same instance).

I noticed that this only happens (in my limited testing) when the width and height of the image is a power of two, I am not sure how to interpret that.

Although it may be hard to see due to browser anti-aliasing (you can download the images and view them in image viewers with no anti-aliasing), there are clear purple-brown columns of pixels every 8th column starting from the 3rd column of every image. To make sure, I tested this on 100 different images and they all followed this pattern.

What is going on here? I am guessing that patterns like this are the reason that people always say to use cryptographically secure random number generators when true randomness is required, but is there a concrete explanation behind why this is happening in particular?

Upvotes: 13

Views: 7225

Answers (4)

Mark Amery
Mark Amery

Reputation: 154765

As others have noted, these patterns have nothing to do with NumPy's random number generation; the problem is simply that PIL's 'RGB' mode expects to get an array of dtype uint8, and when given something else, tries to interpret the raw bytes as if they were an array of uint8s. Here, you are passing 8-byte float64s (NumPy's default when you don't specify a dtype) and this produces the result you see.

You're expecting every random number from 0-255 in your array to define the value for one color channel of one pixel, but in reality, it's defining the value for 8 successive color channels. For instance, the very first random number - which you intend to be the value of the "red" channel of the top-left pixel - is in fact defining the red, green, and blue channels of the top-left pixel and the one to the right of that and the red and green channels of the pixel to the right of that. Oops.

The simplest test that shows that these patterns are not in fact emerging from NumPy's RNG is to simply set all the values in the array to 255 instead of random numbers, and then display that:

>>> import numpy as np
>>> from PIL import Image
>>> img_array = np.full((256, 256, 3), 255.0)
>>> print(img_array.dtype)
float64
>>> Image.fromarray(img_array, 'RGB').show()

Image output by the above code

Sure enough, we still see a pattern of vertical lines.

Upvotes: 1

FHTMitchell
FHTMitchell

Reputation: 12157

I'm pretty sure the problem is to do with the dtype, but not for the reasons you think. Here is one with np.random.randint(0, 256, (1, 256, 256, 3), dtype=np.uint32) note the dtype is not np.uint8:

enter image description here

Can you see the pattern ;)? PIL interprets 32 bit (4 byte) values (probably as 4 pixels RGBK) differently from 8 bit values (RGB for one pixel). (See PM 2Ring's answer).

Originally you were passing 64 bit float values, these are going to also are interpreted differently (and probably incorrectly from how you intended).

Upvotes: 3

Rob
Rob

Reputation: 1517

The Python Docs for random() say this:

Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

The best random number generators pass randomness tests, lesser quality random number generators are often used because they are quick and deemed 'good enough'.

In "Some Difficult-to-Pass Tests of Randomness" Jan 2002, by Marsaglia and Tsang, they determined that a subset of the "Diehard Battery of Tests" could be used to assess the randomness of a series of numbers, specifically the gcd, gorilla and birthday spacings tests. See "Dieharder test descriptions" for a discussion of entropy and comments on those tests.

Over at our Programming Puzzles and Golf Code some people took a shot at developing code to pass the Diehard tests in this question: "Build a random number generator that passes the Diehard tests".

You should expect to see patterns in all but the best (and likely slower) RNGs.

The modern standard for statistical testing of RNGs, "NIST SP 800-22 - Recommendation for Random Number Generation Using Deterministic Random Bit Generators" (Overview) provides a series of tests which amongst other things assesses the closeness of the fraction of ones to ½, that is, the number of ones and zeroes in a sequence should be about the same.

An article published on the ACM website "Algorithm 970: Optimizing the NIST Statistical Test Suite and the Berlekamp-Massey Algorithm" January 2017, by Sýs, Říha and Matyáš, promises an enormous speedup of the NIST algorithms with their reimplantation.

Upvotes: -1

PM 2Ring
PM 2Ring

Reputation: 55479

Don't blame Numpy, blame PIL / Pillow. ;) You're generating floats, but PIL expects integers, and its float to int conversion is not doing what we want. Further research is required to determine exactly what PIL is doing...

In the mean time, you can get rid of those lines by explicitly converting your values to unsigned 8 bit integers:

img_arrays = (np.random.random((100, 256, 256, 3)) * 255).astype(np.uint8)

As FHTMitchell notes in the comments, a more efficient form is

img_arrays = np.random.randint(0, 256, (100, 256, 256, 3), dtype=np.uint8) 

Here's typical output from that modified code:

random image made using Numpy


The PIL Image.fromarray function has a known bug, as described here. The behaviour you're seeing is probably related to that bug, but I guess it could be an independent one. ;)

FWIW, here are some tests and workarounds I did on the bug mentioned on the linked question.

Upvotes: 17

Related Questions