Filtering a numpy array using another array of labels

Question

Given two numpy arrays, i.e:

images.shape: (60000, 784) # An array containing 60000 images
labels.shape: (60000, 10)  # An array of labels for each image

Each row of labels contains a 1 at a particular index to indicate the class of the related example in images. (So [0 0 1 0 0 0 0 0 0 0] would indicate that the example belongs to Class 2 (assuming our class indexing starts from 0).

I am trying to efficiently separate images so that I can manipulate all images belonging to a particular class at once. The most obvious solution would be to use a for loop (as follows). However, I'm not sure how to filter images such that only those with the appropriate labels are returned.

for i in range(0, labels.shape[1]):
  class_images = # (?) Array containing all images that belong to class i

As an aside, I'm also wondering if there are even more efficient approaches that would eliminate the use of the for loop.

Paul Panzer · Accepted Answer

One way would be to convert your label array to bool and use it for indexing:

classes = []
blabels = labels.astype(bool)
for i in range(10):
    classes.append(images[blabels[:, i], :])

Or as a one-liner using list comprehension:

classes = [images[l.astype(bool), :] for l in labels.T]

Filtering a numpy array using another array of labels

Answers (2)

Related Questions