Reputation: 1122
I am trying to create an input pipeline using the tf.data API. I have 3D data and using normal NumPy operations I would've ended up with an array with dimensions [?,256x256x3x100], which one can think of as 100 frames each of 256x256x3 size.
import glob
import os
import numpy as np
import tensorflow.compat.v1 as tf
def readfile(filenames):
flag = 0
for name in filenames:
string = tf.read_file(name)
image = tf.image.decode_image(string, channels=3)
if flag == 0:
bunch = image
flag = 1
else:
bunch = tf.concat([bunch,image],1)
return bunch
with tf.device("/cpu:0"):
train_files = []
for s in [x[0] for x in os.walk("path/to/data/folders")]:
if(s == "path/to/data/folders"):
continue
train_files.append(glob.glob(s+"/*.png"))
# shape of train_files is [5,100]
train_dataset = tf.data.Dataset.from_tensor_slices(train_files)
train_dataset = train_dataset.map(readfile, num_parallel_calls=16)
I think the error is occurring because 'bunch' is changing size in for loop. Error:
ValueError Traceback (most recent call last)
<ipython-input-13-c2f88ca344dc> in <module>
22 train_dataset = train_dataset.map(
---> 23 readfile, num_parallel_calls=16)
ValueError: in converted code:
ValueError: TensorFlow requires that the following symbols must be defined before the loop: ('bunch',)
How do I read the data correctly?
EDIT
What worked for me:
def readfile(filenames):
flag = 0
name = filenames[0]
string = tf.read_file(name)
image = tf.image.decode_image(string, channels=3)
bunch = image
for name in filenames:
string = tf.read_file(name)
image = tf.image.decode_image(string, channels=3)
if flag == 0:
bunch = image
flag = 1
else:
bunch = tf.concat([bunch,image],1)
return bunch
So I'm not sure why it is necessary to initialise bunch
before the loop, when the first iteration should take care of that bunch = image
. It might be because flag is not defined as a tensor so bunch = image
is never actually run?
Upvotes: 2
Views: 2137
Reputation: 3472
The variable bunch
is created inside the function readfile()
and therefore the error, because variables cannot be created inside the loop at run time. A fix would be to move the declaration of the variable bunch
outside the loop. Code sample follows:
import glob
import os
import numpy as np
import tensorflow.compat.v1 as tf
def readfile(filenames):
flag = 0
bunch = <some_appropriate_initialization>
for name in filenames:
string = tf.read_file(name)
image = tf.image.decode_image(string, channels=3)
if flag == 0:
bunch = image
flag = 1
else:
bunch = tf.concat([bunch,image],1)
return bunch
# Rest of the code
Upvotes: 1
Reputation: 608
You can't use arbitrary python code inside a dataset.map
function, that is readfile
in your case. There are two ways to solve this:
By using readfile
code as it is and by calling it astf.py_function
instead, here you can do eager execution, hence you can write any python logic as normal.
By converting the code in readfile
and making use of only tensorflow functions to do the transformation. Performance-wise this is much better than using tf.py_function
.
You can find an example on both at https://www.tensorflow.org/api_docs/python/tf/py_function
Upvotes: 1