Reputation: 3718
Conceptually, I have two lists of equal length, one containing labels
and the other data
. And so I asked this question, not realising that what I really had was two numpy
arrays, not two lists.
What I do have is a folder containing images such as cat_01.jpg
, cat_02.jpg
, dog_01.jpg
, dog_02.jpg
, dog_03.jpg
, fish_01.jpg
, ..., tiger_03.jpg
, zebra_01.jpg
and zebra_02.jpg
. I also have a successful program to read them in, parse a portion of each file name into a labels
array, and the corresponding image data into my data
array, so that I end up with something like:
>>> labels
array(['cat', 'cat', 'dog', ..., 'tiger', 'zebra', 'zebra' ])
>>> type( data )
<class 'numpy.ndarray'>
>>> data[0][0][0]
array([78, 88, 98])
That makes sense - in each sample
at (column
, row
), data[ sample ][ row ][ column ]
represents an (R,G,B) data point.
I want to specify a search label such as 'dog'
, and (conceptually) use it to generate two "sub-lists" - the first containing all the (identical) matching labels in the labels
list, and the other containing the associated image data from data
. But rather than lists, I need to retain the original data format, in this case numpy
arrays (but if there is a more general, data-insensitive approach, I'd love to know about it) . How can I do this?
Update: here's some specific test code to recreate the situation I am confronting, and with a sketch of a solution based on Stephen Rauch's answer:
import os, glob
from PIL import Image
import numpy as np
import pandas as pd # not critical to question
def load_image(file):
data = np.asarray(Image.open(file),dtype="float")
return data
MasterClass = ['cat','dog','fsh','grf','hrs','leo','owl','pig','tgr','zbr']
os.chdir('data\\animals')
filelist = glob.glob("*.jpg")
full_labels = np.array([MasterClass.index(os.path.basename(fname)[:3]) for fname in filelist])
full_images = np.array([load_image(fname) for fname in filelist])
# The following sketch a solution, but which leads to incompatible data types
# That is, the test_images differ from the full_images and/or so do the labels
# with regard to the data types involved.
df = pd.DataFrame(dict(label=list(full_labels),data=list(full_images)))
criteria = df['label'] == MasterClass.index('dog')
test_labels = np.array(df[criteria]['label'])
test_images = np.array(df[criteria]['data'])
Two notes:
tiger_03.jpg
, I was de-obfuscating reality. In truth the code above expects file names like tgr03.jpg
, and the list of labels I end up working with is not even ['cat', 'cat', 'dog', ...]
but is instead a list of indices in the MasterClass
list - that is, [0, 0, 1, ...]
The question is: how do I get test_labels
and test_images
to be in an identical format to the original full_labels
and full_images
but based on a selection criteria
like the one sketched above? This code as it stands does not achieve this level of data compatibility - it does not achieve a strict "slice" function.
Upvotes: 0
Views: 165
Reputation: 3718
Based on Stephen Rauch's answer to my earlier simpler question, it is possible to solve this as follows:
# assume full_labels and full_images exist as per test code in updated question
tuples = (x for x in zip(list(full_labels),list(full_images)) if x[0] == MasterClass.index('dog'))
xlabels,ximages = map(list, zip(*tuples))
test_labels = np.array(xlabels)
test_images = np.array(ximages)
Upvotes: 0
Reputation: 49784
If you can use pandas, it is VERY good at this sort of thing.
If you already have a dataframe, you can simply do:
# build a logical condition
have_dog = df['animal_label'] == 'dog'
# select the data when that condition is true
print(df[have_dog])
import pandas as pd
import numpy as np
animal_label = ['cat', 'cat', 'dog', 'dog', 'dog', 'fish', 'fish', 'giraffe']
data = [0.3, 0.1, 0.9, 0.5, 0.4, 0.3, 0.2, 0.8]
data = [np.array((x,) * 3) for x in data]
df = pd.DataFrame(dict(animal_label=animal_label, data=data))
print(df)
have_dog = df['animal_label'] == 'dog'
print(df[have_dog])
animal_label data
0 cat [0.3, 0.3, 0.3]
1 cat [0.1, 0.1, 0.1]
2 dog [0.9, 0.9, 0.9]
3 dog [0.5, 0.5, 0.5]
4 dog [0.4, 0.4, 0.4]
5 fish [0.3, 0.3, 0.3]
6 fish [0.2, 0.2, 0.2]
7 giraffe [0.8, 0.8, 0.8]
animal_label data
2 dog [0.9, 0.9, 0.9]
3 dog [0.5, 0.5, 0.5]
4 dog [0.4, 0.4, 0.4]
Upvotes: 1
Reputation: 1118
If I understand your problem correctly, this would be done by slicing like this:
selector = 'fish'
matching_labels = labels[labels==selector]
matching_data = data[labels==selector]
Alternatively, you could use the approach from the answer in your previous question and make the list alist
a numpy array by alist = numpy.array(alist)
Upvotes: 0