Reputation: 65
I want to create a dataframe pipeline from a generator that uses pandas dataframes to find image paths on disk and load them into the pipeline. Tensorflow won't allow me to do this, poping a Can't convert non-rectangular Python sequence to Tensor.
message.
I tryied to use .values
in the args
argument when passing the generator to tf.data.Dataset.from_generator
, but I would have to rewrite all the code I wrote using the dataframes to find paths to the right images.
Here's the command to generate the Dataset:
train_dataset = tf.data.Dataset.from_generator(make_triplet_dataset, (tf.float32, tf.float32, tf.float32), args = ([train_families, train_positive_relations]))
And here's the make_triplet_dataset
generator (which uses pandas dataframes as arguments):
def make_triplet_dataset(families, positive_relations):
"""
Dataset Generator that returns a random anchor, positive and negative images each time it is called
"""
while True:
# generates random triplet
anchor, positive, negative = make_triplet(families, positive_relations)
# builds the path for the randomly chosen images
path_anchor_img = 'train/' + anchor + '/' + random.choice(os.listdir('train/' + anchor))
path_positive_img = 'train/' + positive + '/' + random.choice(os.listdir('train/' + positive))
path_negative_img = 'train/' + negative + '/' + random.choice(os.listdir('train/' + negative))
# loads and preprocess the images to be used in the in the algorithm
anchor_img = preprocess_input(cv2.imread(path_anchor_img)) # preprocess does a (img/127.5) - 1 operation
positive_img = preprocess_input(cv2.imread(path_positive_img))
negative_img = preprocess_input(cv2.imread(path_negative_img))
yield (anchor_img, positive_img, negative_img)
The function make_triplet
is a nested function that uses pandas Dataframes to generate paths to the images.
I want to be able to generate a tensorflow Dataset with generators that can yield the images in triplets, using pandas Dataframes to find the paths to those images and load them into the pipeline. Please, if anyone can help, it would be appreciated.
Upvotes: 2
Views: 1597
Reputation: 65
Found an answer. Instead of passing the pandas dataframes arguments for the generator function in the args
parameter in the tf.data.Dataset.from_generator
method, I used lambda
to pass them in the generator function itself:
train_dataset = tf.data.Dataset.from_generator(lambda: make_triplet_dataset(train_families, train_positive_relations), output_types = (tf.float32, tf.float32, tf.float32))
Upvotes: 2