Obscure behaviour of python in need of explanation

Question

I have this obscure behaviour of python that I am trying to understand. I am pretty sure it is not a bug, but I can't really explain WHY it is behaving like this. I am trying to load a few images into a list and then manipulate them.

Here is a minimal example:

import numpy as np
from PIL import Image
import os

DATA_URL = ('./images/')

def load_data():
    data_internal = list()
    for root, dirs, files in os.walk(DATA_URL, topdown=False):
        for name in files:
            with Image.open(DATA_URL + name) as f:

                data_internal.append(f)
                
    return data_internal


data = load_data()
print(data[0])
print(np.array(data[0]))

This code produces the following:

> 
>

so data[0] and np.array(data[0]) are exactly the same thing.

However, if I add a line to print f as a np.array() within the read-in, behaviour changes:

import numpy as np
from PIL import Image
import os

DATA_URL = ('./images/')

def load_data():
    data_internal = list()
    for root, dirs, files in os.walk(DATA_URL, topdown=False):
        for name in files:
            with Image.open(DATA_URL + name) as f:

                data_internal.append(f)
                print(np.array(data_internal[0]))
    return data_internal


data = load_data()
print(data[0])
print(np.array(data[0]))

The output of that example is

> [[[146 135 129]
>  [145 134 128]]
>
> [[148 137 133]
>  [148 137 133]]]
>
>
>
>[[[146 135 129]
>  [145 134 128]]
>
> [[148 137 133]
>  [148 137 133]]]

as I would expect.

Can anyone please tell me, why the mere access of the mutated (cast to numpy array) list entry makes this possible?

Thanks a lot & best regards

James · Accepted Answer

You are opening the image files for reading using a context (with), and appending the Image object to a list. However, once the context closes - when the with indentation block ends - the Image object closes the pointer to the file.

For speed and efficiency, PIL does not load in all of the image data right away, but only when needed, such as when it is converted to a numpy array. The first time the data is requested it is read in and also saved to the Image object so it can be reused again.

So that is the difference, calling np.array(data_internal[0]) while within the context means that the pointer to the file is still open for reading. The data is read and saved to the Image object, and calling np.array again just returns the previously read data.

Here are 3 examples:

reading with no context

x = Image.open('pic.jpg')

np.array(x)
# returns:
array([[[ 55,  39,  42],
        [ 61,  45,  48],
        [ 55,  39,  42],
        ...,
        [110,  93,  83],
        [111,  94,  84],
        [121, 104,  94]],
...

x.getdata()
# returns:

opening within a context, but without loading data

with open Image.open('pic.jpg') as f
    y = f

np.array(y)
# returns a zero-dimension array of the object:
array(,
      dtype=object)

y.getdata()
# raises an error
...
AttributeError: 'NoneType' object has no attribute 'seek'

open with context and load data

with open Image.open('pic.jpg') as f
    z = f
    # force loading of the data here:
    np.array(z)

np.array(z)
# returns:
array([[[ 55,  39,  42],
        [ 61,  45,  48],
        [ 55,  39,  42],
        ...,
        [110,  93,  83],
        [111,  94,  84],
        [121, 104,  94]],
...

z.getdata()
# returns:

Obscure behaviour of python in need of explanation

Answers (1)

Related Questions