Andrew Wiedenmann
Andrew Wiedenmann

Reputation: 312

Using a dataset of filenames, create a dataset of images to tuples

I create a tensorflow dataset of filenames of many images in a folder. The images are named [index].jpg, where index is some integer used to identify the images. I have a dictionary of string 'index' to labels as tuples. How, using tf.data.Dataset.map, can I map the index to a label tuple?

Here's the map_func I am trying to pass to the map function:

def grabImages(filepath):
   index = getIndexFromFilePath(filepath)
   img = tf.io.read_file(filepath)
   img = translateImage(img)
   dictionary = getLabelDictionary()
   return index, img

Where dictionary is the index to labels dict, index is the index of the filepath as tf.Tensor and img is a preprocessed image that was at the filepath.

This returns a dataset with the index, as a tensor, mapped to the corresponding image. Is there a way to get the labels of the index using dictionary using something like dictionary[index]? Basically, I want to find the string content of index.

I have tried using .numpy() and .eval() with the current session within the grabImages function, but neither work.

Upvotes: 1

Views: 660

Answers (1)

user11530462
user11530462

Reputation:

Here is an example of how to get string part of a tensor in the tf.data.Dataset.map function.

Below are the steps I have implemented in the code to achieve this.

  1. You have to decorate the map function with tf.py_function(get_path, [x], [tf.string]). You can find more about tf.py_function here.
  2. You can get your string part by using bytes.decode(file_path.numpy()) in map function.

Code -

%tensorflow_version 2.x
import tensorflow as tf
import numpy as np

def get_path(file_path):
    print("file_path: ",bytes.decode(file_path.numpy()),type(bytes.decode(file_path.numpy())))
    return file_path

train_dataset = tf.data.Dataset.list_files('/content/bird.jpg')
train_dataset = train_dataset.map(lambda x: tf.py_function(get_path, [x], [tf.string]))

for one_element in train_dataset:
    print(one_element)

Output -

file_path:  /content/bird.jpg <class 'str'>
(<tf.Tensor: shape=(), dtype=string, numpy=b'/content/bird.jpg'>,)

Hope this answers your question.

Upvotes: 2

Related Questions