How to merge two tf.data.Dataset into one, alternating elements with known ratio

Question

I have two tf.data.Dataset, let call them d1 and d2 and I want to construct another dataset that constains the elements of d1 and d2 alternating. It is easier to explain with an example. Let say:

d1 = [0,1,2,3,4,5,6,7,...] # it is not a list, just the content of the dataset

d2 = ["a", "b", "c", "d",... ]

and I have the couple specifying the number of consecutive elements from each dataset (for example (3,1)).

The result that I am looking for is:

result = [0, 1, 2, "a", 3, 4, 5, "b", 6, 7, 8, "c"...]

EDIT: d1 and d2 are objects of the class tf.data.Dataset. The example above shows just the content of the datasets but it is not code.

Michał Słapek · Accepted Answer

Assuming TF 2.0. The trick is based on batch followed by datasets interleave and unbatch.

import tensorflow as tf 

# input datasets
d1 = tf.data.Dataset.from_tensors([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]).unbatch()
d2 = tf.data.Dataset.from_tensors([100, 101, 102]).unbatch()
# replaced letters with numbers to make tensor types match

# define ratio
r1 = 3
r2 = 1

b1 = d1.batch(r1)
b2 = d2.batch(r2)

zipped = tf.data.Dataset.zip((b1, b2)).map(lambda x, y: tf.concat((x, y), axis=0))
result = zipped.unbatch()

Output:

In [9]: list(result)                                                                                                                  
Out[9]: 
[,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ]

Note: This solution might remove some elements at the end of d1 or d2 - their lengths must be adjusted to the ratio.

How to merge two tf.data.Dataset into one, alternating elements with known ratio

Answers (2)

Related Questions