Reputation: 181
I have this obscure behaviour of python that I am trying to understand. I am pretty sure it is not a bug, but I can't really explain WHY it is behaving like this. I am trying to load a few images into a list and then manipulate them.
Here is a minimal example:
import numpy as np
from PIL import Image
import os
DATA_URL = ('./images/')
def load_data():
data_internal = list()
for root, dirs, files in os.walk(DATA_URL, topdown=False):
for name in files:
with Image.open(DATA_URL + name) as f:
data_internal.append(f)
return data_internal
data = load_data()
print(data[0])
print(np.array(data[0]))
This code produces the following:
> <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2x2 at 0x1EDCC735708>
> <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2x2 at 0x1EDCC735708>
so
data[0]
and
np.array(data[0])
are exactly the same thing.
However, if I add a line to print
f
as a
np.array()
within the read-in, behaviour changes:
import numpy as np
from PIL import Image
import os
DATA_URL = ('./images/')
def load_data():
data_internal = list()
for root, dirs, files in os.walk(DATA_URL, topdown=False):
for name in files:
with Image.open(DATA_URL + name) as f:
data_internal.append(f)
print(np.array(data_internal[0]))
return data_internal
data = load_data()
print(data[0])
print(np.array(data[0]))
The output of that example is
> [[[146 135 129]
> [145 134 128]]
>
> [[148 137 133]
> [148 137 133]]]
>
><PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2x2 at 0x261491DEF48>
>
>[[[146 135 129]
> [145 134 128]]
>
> [[148 137 133]
> [148 137 133]]]
as I would expect.
Can anyone please tell me, why the mere access of the mutated (cast to numpy array) list entry makes this possible?
Thanks a lot & best regards
Upvotes: 1
Views: 71
Reputation: 36756
You are opening the image files for reading using a context (with
), and appending the Image
object to a list. However, once the context closes - when the with
indentation block ends - the Image
object closes the pointer to the file.
For speed and efficiency, PIL
does not load in all of the image data right away, but only when needed, such as when it is converted to a numpy array. The first time the data is requested it is read in and also saved to the Image
object so it can be reused again.
So that is the difference, calling np.array(data_internal[0])
while within the context means that the pointer to the file is still open for reading. The data is read and saved to the Image
object, and calling np.array
again just returns the previously read data.
Here are 3 examples:
x = Image.open('pic.jpg')
np.array(x)
# returns:
array([[[ 55, 39, 42],
[ 61, 45, 48],
[ 55, 39, 42],
...,
[110, 93, 83],
[111, 94, 84],
[121, 104, 94]],
...
x.getdata()
# returns:
<ImagingCore at 0x2b21acd0950>
with open Image.open('pic.jpg') as f
y = f
np.array(y)
# returns a zero-dimension array of the object:
array(<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2448x3264 at 0x2B21BA99A48>,
dtype=object)
y.getdata()
# raises an error
...
AttributeError: 'NoneType' object has no attribute 'seek'
with open Image.open('pic.jpg') as f
z = f
# force loading of the data here:
np.array(z)
np.array(z)
# returns:
array([[[ 55, 39, 42],
[ 61, 45, 48],
[ 55, 39, 42],
...,
[110, 93, 83],
[111, 94, 84],
[121, 104, 94]],
...
z.getdata()
# returns:
<ImagingCore at 0x2b21a74fb70>
Upvotes: 2