sergejsrk
sergejsrk

Reputation: 13

Adding an additional dimension to ndarray

I have and ndarray defined in the following way:

dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
                         dtype=np.float32)

This array represents a collection of images of size image_size * image_size. So I can say, dataset[0] and get a 2D table corresponding to an image with index 0.

Now I would like to have one additional field for each image in this array. For instance, for image located at index 0, I would like to store number 123, for an image located at index 321 I would like to store number 50000.

What is the simplest way to add this additional data field to the existing ndarray? What is the appropriate way to access data in the new array after adding this additional dimension?

Upvotes: 1

Views: 109

Answers (2)

Bobby Ocean
Bobby Ocean

Reputation: 3328

Numpy arrays are fundamentally tensors, i.e., they have a shape that is absolute across the axes. Meaning that the shape is fixed and not variable. Take for example,

import numpy as np

x = np.array([[[1,2],[3,4]],
              [[5,6],[7,8]]
             ])
print(x.shape) #Here we have two, 2x2s. Shape = (2,2,2)

If I want to associate x[0] to the number 5 and x[1] to the number 7, then that would be something like (if it was possible):

x = np.array([[[1,2],[3,4]],5,
              [[5,6],[7,8]],7
             ])

But such thing is impossible, since it would "in some sense" have a shape that corresponds to (2,((2,2),1)), or something else that is ambiguous. Such an object is not a numpy array or a tensor. It doesn't have fixed axis sizes. All numpy arrays must have fixed axis sizes. Hence, if you wish to store the new information, the only way to do it, is to create another array.

x = np.array([[[1,2],[3,4]],
              [[5,6],[7,8]],
             ])
y = np.array([5,7])

Now x[0] corresponds to y[0] and x[1] corresponds to y[1]. x has shape (2,2,2) and y has shape (2,).

Upvotes: 0

hpaulj
hpaulj

Reputation: 231385

If you shuffle an index array instead of the dataset itself, you can keep track of the original 'identifiers'

idx = np.arange(len(image_files))
np.random.shuffle(idx)
shuffle_set = dataset[idx]

illustration:

In [20]: x = np.arange(12).reshape(6,2)
    ...: idx = np.arange(6)
    ...: np.random.shuffle(idx) 
In [21]: x
Out[21]: 
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
In [22]: x[idx]             # shuffled
Out[22]: 
array([[ 4,  5],
       [ 0,  1],
       [ 2,  3],
       [ 6,  7],
       [10, 11],
       [ 8,  9]])
In [23]: idx1=np.argsort(idx)
In [24]: idx
Out[24]: array([2, 0, 1, 3, 5, 4])
In [25]: idx1
Out[25]: array([1, 2, 0, 3, 5, 4])
In [26]: Out[22][idx1]       # recover original order
Out[26]: 
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

Upvotes: 1

Related Questions