Reputation: 327
I have two tf.data.Dataset, let call them d1
and d2
and I want to construct another dataset that constains the elements of d1
and d2
alternating. It is easier to explain with an example.
Let say:
d1 = [0,1,2,3,4,5,6,7,...] # it is not a list, just the content of the dataset
d2 = ["a", "b", "c", "d",... ]
and I have the couple specifying the number of consecutive elements from each dataset (for example (3,1)).
The result that I am looking for is:
result = [0, 1, 2, "a", 3, 4, 5, "b", 6, 7, 8, "c"...]
EDIT: d1 and d2 are objects of the class tf.data.Dataset. The example above shows just the content of the datasets but it is not code.
Upvotes: 2
Views: 3326
Reputation: 1572
Assuming TF 2.0. The trick is based on batch followed by datasets interleave and unbatch.
import tensorflow as tf
# input datasets
d1 = tf.data.Dataset.from_tensors([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]).unbatch()
d2 = tf.data.Dataset.from_tensors([100, 101, 102]).unbatch()
# replaced letters with numbers to make tensor types match
# define ratio
r1 = 3
r2 = 1
b1 = d1.batch(r1)
b2 = d2.batch(r2)
zipped = tf.data.Dataset.zip((b1, b2)).map(lambda x, y: tf.concat((x, y), axis=0))
result = zipped.unbatch()
Output:
In [9]: list(result)
Out[9]:
[<tf.Tensor: id=224, shape=(), dtype=int32, numpy=0>,
<tf.Tensor: id=225, shape=(), dtype=int32, numpy=1>,
<tf.Tensor: id=226, shape=(), dtype=int32, numpy=2>,
<tf.Tensor: id=227, shape=(), dtype=int32, numpy=100>,
<tf.Tensor: id=228, shape=(), dtype=int32, numpy=3>,
<tf.Tensor: id=229, shape=(), dtype=int32, numpy=4>,
<tf.Tensor: id=230, shape=(), dtype=int32, numpy=5>,
<tf.Tensor: id=231, shape=(), dtype=int32, numpy=101>,
<tf.Tensor: id=232, shape=(), dtype=int32, numpy=6>,
<tf.Tensor: id=233, shape=(), dtype=int32, numpy=7>,
<tf.Tensor: id=234, shape=(), dtype=int32, numpy=8>,
<tf.Tensor: id=235, shape=(), dtype=int32, numpy=102>]
Note: This solution might remove some elements at the end of d1
or d2
- their lengths must be adjusted to the ratio.
Upvotes: 5
Reputation: 2492
print(d1)
print("---------------------------")
print(d2)
print("---------------------------")
def interweave(x, d1, y, d2):
"""
x = How many lines of d1 to add before adding a line from d2
d1 = the d1 dataframe
y = How many lines of d2 to add before adding a line from d1 again
d2 = the d2 dataframe
"""
d3 = pd.DataFrame()
countx = 0
county = 0
length = len(d1) if len(d1) > len(d2) else len(d2)
for count in range(0,length):
for i in range(countx, countx + x):
try: # This will prevent script halt from unequal or indivisible lengths
row = d1.iloc[(i)]
except:
break
d3 = d3.append(row)
countx += 1
for j in range(county, county + y):
try: # This will prevent script halt from unequal or indivisible lengths
row = d2.iloc[j]
except:
break
d3 = d3.append(row)
county += 1
d3 = d3.reset_index(drop = True)
return d3
d3 = interweave(3, d1, 1, d2)
print(d3)
OUTPUT:
Col1 Col2
0 0 0
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
7 7 70
8 8 80
9 9 90
10 10 100
---------------------------
Col1 Col2
0 a A
1 b B
2 c C
---------------------------
Col1 Col2
0 0 0
1 1 10
2 2 20
3 a A
4 3 30
5 4 40
6 5 50
7 b B
8 6 60
9 7 70
10 8 80
11 c C
12 9 90
13 10 100
Upvotes: 0