ytrewq
ytrewq

Reputation: 3930

python numpy: array of arrays

I'm trying to build a numpy array of arrays of arrays with the following code below.

Which gives me a

ValueError: setting an array element with a sequence.

My guess is that in numpy I need to declare the arrays as multi-dimensional from the beginning, but I'm not sure..

How can I fix the the code below so that I can build array of array of arrays?

from PIL import Image
import pickle
import os
import numpy

indir1 = 'PositiveResize'

trainimage = numpy.empty(2)
trainpixels = numpy.empty(80000)
trainlabels = numpy.empty(80000)
validimage = numpy.empty(2)
validpixels = numpy.empty(10000)
validlabels = numpy.empty(10000)
testimage = numpy.empty(2)
testpixels = numpy.empty(10408)
testlabels = numpy.empty(10408)

i=0
tr=0
va=0
te=0
for (root, dirs, filenames) in os.walk(indir1):
    print 'hello'
    for f in filenames:
            try:
                    im = Image.open(os.path.join(root,f))
                    Imv=im.load()
                    x,y=im.size
                    pixelv = numpy.empty(6400)
                    ind=0
                    for i in range(x):
                            for j in range(y):
                                    temp=float(Imv[j,i])
                                    temp=float(temp/255.0)
                                    pixelv[ind]=temp
                                    ind+=1
                    if i<40000:
                            trainpixels[tr]=pixelv
                            tr+=1
                    elif i<45000:
                            validpixels[va]=pixelv
                            va+=1
                    else:
                            testpixels[te]=pixelv
                            te+=1
                    print str(i)+'\t'+str(f)
                    i+=1
            except IOError:
                    continue

trainimage[0]=trainpixels
trainimage[1]=trainlabels
validimage[0]=validpixels
validimage[1]=validlabels
testimage[0]=testpixels
testimage[1]=testlabels

Upvotes: 1

Views: 2073

Answers (2)

Roger Fan
Roger Fan

Reputation: 5045

Don't try to smash your entire object into a numpy array. If you have distinct things, use a numpy array for each one then use an appropriate data structure to hold them together.

For instance, if you want to do computations across images then you probably want to just store the pixels and labels in separate arrays.

trainpixels = np.empty([10000, 80, 80])
trainlabels = np.empty(10000)
for i in range(10000):
    trainpixels[i] = ...
    trainlabels[i] = ...

To access an individual image's data:

imagepixels = trainpixels[253]
imagelabel = trainlabels[253]

And you can easily do stuff like compute summary statistics over the images.

meanimage = np.mean(trainpixels, axis=0)
meanlabel = np.mean(trainlabels)

If you really want all the data to be in the same object, you should probably use a struct array as Eelco Hoogendoorn suggests. Some example usage:

# Construction and assignment
trainimages = np.empty(10000, dtype=[('label', np.int), ('pixel', np.int, (80,80))])
for i in range(10000):
    trainimages['label'][i] = ...
    trainimages['pixel'][i] = ...

# Summary statistics
meanimage = np.mean(trainimages['pixel'], axis=0)
meanlabel = np.mean(trainimages['label'])

# Accessing a single image
image = trainimages[253]
imagepixels, imagelabel = trainimages[['pixel', 'label']][253]

Alternatively, if you want to process each one separately, you could store each image's data in separate arrays and bind them together in a tuple or dictionary, then store all of that in a list.

trainimages = []
for i in range(10000):
    pixels = ...
    label = ...
    image = (pixels, label)
    trainimages.append(image)

Now to access a single images data:

imagepixels, imagelabel = trainimages[253]

This makes it more intuitive to access a single image, but because all the data is not in one big numpy array you don't get easy access to functions that work across images.

Upvotes: 1

Brian Cain
Brian Cain

Reputation: 14619

Refer to the examples in numpy.empty:

>>> np.empty([2, 2])
array([[ -9.74499359e+001,   6.69583040e-309],
       [  2.13182611e-314,   3.06959433e-309]])         #random

Give your images a shape with the N dimensions:

testpixels = numpy.empty([96, 96])

Upvotes: 1

Related Questions