Feeding large pandas dataframe into Tensorflow

Question

I am trying to use data from a Pandas dataframe to feed into a Tensorflow pipeline. I tried to do this:

training_dataset = (tf.data.Dataset.from_tensor_slices((
         tf.cast(df[df.columns[:-1]].values, tf.float32),
         tf.cast(df[df.columns[-1]].values, tf.int32))))

where df is my dataframe. However it is very large and I got this error:

ValueError: Cannot create a tensor proto whose content is larger than 2GB.

Should I split the dataframe and create several tensor datasets, and would that even work? What is the best way to approach this? I thought about feeding the data into feed_dict, but I couldnt figure out how to go about it.

Sharky · Accepted Answer

There's no need to manually split your dataframe. You can use tf.placeholder to avoid hitting 2GB graphdef limit. Create numpy arrays from dataframe using DataFrame.values Take a look at this https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays

Feeding large pandas dataframe into Tensorflow

Answers (1)

Related Questions