tf.data.Dataset - behavior of map() and cache() methods

Question

Questions wrt. Tensorflow Datasets
https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map
https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache

How does the map function actually work ? The print(rand) in mapfn() prints just one value but print(x) prints values as expected
Why does the map function behave differently compared to python map() function
- dataset.map(mapfn) prints only 1 value
- map(mapfn, numbers) prints 4 values
When I get same result for x and y below, what is purpose of using dataset.cache() ?

import tensorflow as tf
from random import random
from math import ceil

def mapfn(x):
    rand = ceil(5*random())
    print(rand)
    return x**rand

dataset = tf.data.Dataset.range(50)
dataset = dataset.map(mapfn)
# dataset = dataset.cache()

x = list(dataset.as_numpy_iterator())
print(x)

y = list(dataset.as_numpy_iterator())
print(y)

vs

def mapfn(n):
    rand = ceil(5*random())
    print(rand)
    return n**rand
  
numbers = [1, 2, 3, 4]
result = map(mapfn, numbers)
print(list(result))

tf.data.Dataset - behavior of map() and cache() methods

Answers (1)

Related Questions