Tensorflow: Dataset.from_generate() ValueError: setting an array element with a sequence

Question

Goal

I want to use tensor as part of input in the dataset.from_generator method.

Error Message

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1321     try:
-> 1322       return fn(*args)
   1323     except errors.OpError as e:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1306       return self._call_tf_sessionrun(
-> 1307           options, feed_dict, fetch_list, target_list, run_metadata)
   1308 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1408           self._session, options, feed_dict, fetch_list, target_list,
-> 1409           run_metadata)
   1410     else:

InvalidArgumentError: ValueError: setting an array element with a sequence.
Traceback (most recent call last):

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 157, in __call__
    ret = func(*args)

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 391, in generator_py_func
    nest.flatten_up_to(output_types, values), flattened_types)

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 390, in 
    for ret, dtype in zip(

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 124, in _convert
    result = np.asarray(value, dtype=dtype, order="C")

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy/core/numeric.py", line 492, in asarray
    return array(a, dtype, copy=False, order=order)

ValueError: setting an array element with a sequence.


     [[Node: PyFunc = PyFunc[Tin=[DT_INT64], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_150"](arg0)]]
     [[Node: IteratorGetNext_22 = IteratorGetNext[output_shapes=[, ], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator_22)]]

During handling of the above exception, another exception occurred:

Error Reproduce

if you replace the definition of b=tf.ones... with b = np.rand..,the error will disappear.

import numpy as np
import tensorflow as tf

def _create_generator():
    for i in range(3):
        a = np.random.randn(3,2)
        b = tf.ones([1],tf.float32)
        #b= np.random.randn(1)
        result = {}
        result['a'] =  a
        result['b'] = b
        yield result


gen = _create_generator()

dataset = tf.data.Dataset().from_generator(_create_generator,
                        output_shapes={'a':None,'b':None},
                        output_types ={'a':tf.float32, 'b':tf.float32}).batch(1)
iterator = dataset.make_one_shot_iterator()
features = iterator.get_next()


init = tf.initialize_all_variables()
with tf.Session() as sess:
    sess.run(init)
    print(sess.run(features))

Why do I have to use tensor as input

Well, This is because my real program needs to use the output of another tf.data.Dataset as part of the Input(data is stored in format of TFRecords). So it will raise exact the same error as you will see after running this snippet of code. Right now I don't have any idea to make around and fix this problem indirectly(without using tensor as input).

Why I need to use Dataset.from_generator

There is a hack to use estimator.predict() without loading the graph every times you call it, which is to use generator to keep the entry open and it will presume you have not done with 'single' prediction. Then Tensorflow won't load the model graph again and again.

If you need more bg info about my info. let me know it. Thank you!

Edit1:

Why I have to use Dataset API

The data volume is huge and originally saved on hdfs. So the pipeline was processed in Spark and saved in format of TFRecord. And as far as I know I can only use Dataset api to restore my data here(also kind of considering the performance here).