Convert Tensorflow Dataset into 2 arrays containing images and labels

Question

I'm using TensorFlow 2.0 and I have a batched dataset which contains 968 images and a label (4 element array) for each:

dataSetSize = allDataSet.reduce(0, lambda x, _: x + 1).numpy()
allDataSet = allDataSet.shuffle(dataSetSize)
allDataSet = allDataSet.map(processPath, num_parallel_calls=tf.data.experimental.AUTOTUNE)
allDataSet = allDataSet.batch(10)
predictions = loadedModel.predict(allDataSet)

onlyImages = # how to create this?
onlyLabels = # how to create this?

# the 'map' function in my dataset returns a batch of images and their corresponding labels
for idx, (imageBatch, labelBatch) in enumerate(allDataSet) :
    # how to concatenate batches together?
    onlyImages = # ?
    onlyLabels = # ?

I need to separate this dataset into two numpy arrays. The first array should contain only the 968 images (shape: (968, 299, 299, 3)) and the second the 968 labels (shape: (968, 4)). How can I do that?

I checked a similar question here but these examples seem to be using Tensorflow 1.x and consist of a different input type?

Size of dataset and types:

dataset size:  968

Timbus Calin · Accepted Answer

If I understand your question well, what you need now to do is to concatenate to a numpy array as you iterate through your dataset. Note that, during iteration, if you apply .numpy() operation, you automatically convert from tf.tensor to np.array.

Therefore, the following options are available to you:

As per the documentation,
```
  a = np.array([[1, 2], [3, 4]])
  b = np.array([[5, 6]])
  np.concatenate((a, b), axis=0)
```
Output is:
```
array([[1, 2],
       [3, 4],
       [5, 6]])
```
So, in your code, define an initial empty numpy array to which you concatenate, on axis=0(with imageBatch and labelBatch).
Or you could use np.vstack(np.concatenate uses np.vstack under the hood) which provides the same result.

Convert Tensorflow Dataset into 2 arrays containing images and labels

Answers (1)

Related Questions