John Difool
John Difool

Reputation: 5702

Pixel coordinates vs Drawing Coordinates

In the code snippet below, passing x and y values puts the dot in (y,x) coordinates while the drawing is done in (x,y). What is the correct way to set up the drawing buffer so it's placing pixels and drawing in the same coordinate system?

from PIL import Image, ImageDraw

def visual_test(x, y):
    grid = np.zeros((100, 100, 3), dtype=np.uint8)
    grid[:] = [0, 0, 0]
    grid[x, y] = [255, 0, 0]
    img = Image.fromarray(grid, 'RGB')
    draw = ImageDraw.Draw(img)
    draw.line((x, y, x, y-5), fill=(255,255,255), width=1)
    img.show()

Upvotes: 5

Views: 14842

Answers (1)

handle
handle

Reputation: 6329

Note: with "axis" I refer to image coordinates, not to NumPy's array dimensions.

The issue is with the interpretation of ndarray's dimensions ("The N-dimensional array"), or the definition of a coordinate system in that context.

For Pillow, it's clear:

Coordinate System

The Python Imaging Library uses a Cartesian pixel coordinate system, with (0,0) in the upper left corner. Note that the coordinates refer to the implied pixel corners; the centre of a pixel addressed as (0, 0) actually lies at (0.5, 0.5).

Coordinates are usually passed to the library as 2-tuples (x, y). Rectangles are represented as 4-tuples, with the upper left corner given first. For example, a rectangle covering all of an 800x600 pixel image is written as (0, 0, 800, 600).

That would look like this (image -> public domain):

Pillow's XY coordinate system

Your code, modified to create a 2x2 pixel image:

import numpy as np
from PIL import Image # Pillow

w, h, d = 2,2,3
x,y = 0,1

grid = np.zeros((w, h, d), dtype=np.uint8) # NumPyarray for image data
#test = np.zeros(w*h*d, dtype=np.uint8).reshape(w, h, d)
#print(np.array_equal(grid,test)) # => True

# red pixel with NumPy
grid[x, y] = [255, 0, 0]

print(grid[::])

# green pixel with Pillow
img = Image.fromarray(grid, 'RGB')
pixels = img.load()
pixels[x,y] = (0, 255, 0)

# display temporary image file with default application
scale = 100
img.resize((w*scale,h*scale)).show()

shows the issue (draw pixel at (0,1), green: Pillow, red: ndarray):

generated image

X and Y indeed are swapped:

ndarrays YX axes

Is it because of NumPy or Pillow?

The ndarray prints as

[[[  0   0   0]
  [255   0   0]]

 [[  0   0   0]
  [  0   0   0]]]

which is easily reformatted to visually correspond to the image pixels

[
 [ [  0   0   0] [255   0   0] ]
 [ [  0   0   0] [  0   0   0] ]
]

which shows that Pillow interprets the array as one would expect.

But why does NumPy's ndarray seem to swap the axes?

Let's take this apart a bit further

[ # grid
 [ # grid[0]
   [  0   0   0]  #grid[0,0]
                  [255   0   0] #grid[0,1]
 ]
 [ #grid[1]
   [  0   0   0]  #grid[1,0]
                  [  0   0   0] #grid[1,1]
 ]
]

Let's test this (-i has Python run in interactive mode once the script is finished):

>py -i t.py
[[[  0   0   0]
  [255   0   0]]

 [[  0   0   0]
  [  0   0   0]]]
>>> grid[0,1]
array([255,   0,   0], dtype=uint8)
>>> grid[0]
array([[  0,   0,   0],
       [255,   0,   0]], dtype=uint8)
>>> ^Z

which confirms the assumed indexes above.

It becomes obvious how the first dimension of the ndarray corresponds to the image lines or Y axis, the second to the image columns or X axis (and the third obviously to the RGB pixel values).

So, to match the "coordinate systems", either ...

  1. ... the axes need to be "swapped"
  2. ... the data needs to be "swapped"
  3. ... the axis interpretation needs to "swapped"

Let's see:

1. Simply swapping the index variables when writing to the ndarray:

# red pixel with NumPy
grid[y, x] = [255, 0, 0]

expectedly results in

[[[  0   0   0]
  [  0   0   0]]

 [[255   0   0]
  [  0   0   0]]]

and

enter image description here

Of course a wrapper function could do this.

2. Transposing the array, as suggested by zch, does not work that easily on a 3-dimensional array, since this function affects all dimensions by default:

grid = np.transpose(grid)
print("transposed\n", grid)
print("shape:", grid.shape)

results in

[[[  0   0]
  [255   0]]

 [[  0   0]
  [  0   0]]

 [[  0   0]
  [  0   0]]]
shape: (3, 2, 2)

and because of the Pillow RGB image mode specified, consequently an Exception is thrown:

ValueError: not enough image data

But there is an additional argument to np.transpose, axes:

...permute the axes according to the values given.

We want to swap only 0 and 1, but not 2, so:

grid = np.transpose(grid, (1,0,2))

There are other functions that operate similarly, e.g.

grid = np.swapaxes(grid,0,1)

3. Change the interpretation ?

Can Pillow's PIL.Image.fromarray be brought to interpret the ndarray with swapped axes? It does not have any other arguments than mode for color (really, see the source code).

Creates an image memory from an object exporting the array interface using the buffer protocol). If obj is not contiguous, then the tobytes method is called and frombuffer() is used.

The function figures out how to call PIL.Image.frombuffer() (source), which has a few more options for the "decoder".

Array interface? Buffer protocol? That's both a little too low-level for now...

TL;DR
Just swap the index variables (either)!


Further reading: - https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

Upvotes: 17

Related Questions