Reputation: 268
I am trying to use data from a Pandas dataframe to feed into a Tensorflow pipeline. I tried to do this:
training_dataset = (tf.data.Dataset.from_tensor_slices((
tf.cast(df[df.columns[:-1]].values, tf.float32),
tf.cast(df[df.columns[-1]].values, tf.int32))))
where df
is my dataframe. However it is very large and I got this error:
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
Should I split the dataframe and create several tensor datasets, and would that even work? What is the best way to approach this? I thought about feeding the data into feed_dict
, but I couldnt figure out how to go about it.
Upvotes: 1
Views: 1300
Reputation: 4543
There's no need to manually split your dataframe. You can use tf.placeholder
to avoid hitting 2GB graphdef limit. Create numpy arrays from dataframe using DataFrame.values
Take a look at this https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays
Upvotes: 1