Carl Philipp
Carl Philipp

Reputation: 181

Obscure behaviour of python in need of explanation

I have this obscure behaviour of python that I am trying to understand. I am pretty sure it is not a bug, but I can't really explain WHY it is behaving like this. I am trying to load a few images into a list and then manipulate them.

Here is a minimal example:

import numpy as np
from PIL import Image
import os

DATA_URL = ('./images/')

def load_data():
    data_internal = list()
    for root, dirs, files in os.walk(DATA_URL, topdown=False):
        for name in files:
            with Image.open(DATA_URL + name) as f:

                data_internal.append(f)
                
    return data_internal


data = load_data()
print(data[0])
print(np.array(data[0]))

This code produces the following:

> <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2x2 at 0x1EDCC735708>
> <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2x2 at 0x1EDCC735708>

so data[0] and np.array(data[0]) are exactly the same thing.

However, if I add a line to print f as a np.array() within the read-in, behaviour changes:

import numpy as np
from PIL import Image
import os

DATA_URL = ('./images/')

def load_data():
    data_internal = list()
    for root, dirs, files in os.walk(DATA_URL, topdown=False):
        for name in files:
            with Image.open(DATA_URL + name) as f:

                data_internal.append(f)
                print(np.array(data_internal[0]))
    return data_internal


data = load_data()
print(data[0])
print(np.array(data[0]))

The output of that example is

> [[[146 135 129]
>  [145 134 128]]
>
> [[148 137 133]
>  [148 137 133]]]
>
><PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2x2 at 0x261491DEF48>
>
>[[[146 135 129]
>  [145 134 128]]
>
> [[148 137 133]
>  [148 137 133]]]

as I would expect.

Can anyone please tell me, why the mere access of the mutated (cast to numpy array) list entry makes this possible?

Thanks a lot & best regards

Upvotes: 1

Views: 71

Answers (1)

James
James

Reputation: 36756

You are opening the image files for reading using a context (with), and appending the Image object to a list. However, once the context closes - when the with indentation block ends - the Image object closes the pointer to the file.

For speed and efficiency, PIL does not load in all of the image data right away, but only when needed, such as when it is converted to a numpy array. The first time the data is requested it is read in and also saved to the Image object so it can be reused again.

So that is the difference, calling np.array(data_internal[0]) while within the context means that the pointer to the file is still open for reading. The data is read and saved to the Image object, and calling np.array again just returns the previously read data.

Here are 3 examples:

  1. reading with no context
x = Image.open('pic.jpg')

np.array(x)
# returns:
array([[[ 55,  39,  42],
        [ 61,  45,  48],
        [ 55,  39,  42],
        ...,
        [110,  93,  83],
        [111,  94,  84],
        [121, 104,  94]],
...

x.getdata()
# returns:
<ImagingCore at 0x2b21acd0950>
  1. opening within a context, but without loading data
with open Image.open('pic.jpg') as f
    y = f

np.array(y)
# returns a zero-dimension array of the object:
array(<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2448x3264 at 0x2B21BA99A48>,
      dtype=object)

y.getdata()
# raises an error
...
AttributeError: 'NoneType' object has no attribute 'seek'
  1. open with context and load data
with open Image.open('pic.jpg') as f
    z = f
    # force loading of the data here:
    np.array(z)

np.array(z)
# returns:
array([[[ 55,  39,  42],
        [ 61,  45,  48],
        [ 55,  39,  42],
        ...,
        [110,  93,  83],
        [111,  94,  84],
        [121, 104,  94]],
...

z.getdata()
# returns:
<ImagingCore at 0x2b21a74fb70>

Upvotes: 2

Related Questions