anon_swe
anon_swe

Reputation: 9345

Tensorflow: ValueError: Cannot create a tensor proto whose content is larger than 2GB

I'm using the tf.data.Dataset API and have a dataset that's ~500K rows and ~1,800 columns. When I try:

dataset = tf.data.Dataset.from_tensor_slices(({"reviews": data}, labels))

I get back:

ValueError: Cannot create a tensor proto whose content is larger than 2GB.

I've googled around and seen a lot of people run into this issue but no satisfactory answers. Is there a way to get around this limit or a tensorflow approach that will break up my dataset? I already batch it but that happens after calling:

dataset = tf.data.Dataset.from_tensor_slices(({"reviews": data}, labels))

For what it's worth, my code to read the data from CSV into a tf.data.Dataset works when I use 10% of the data.

Any suggestions would be awesome!

Upvotes: 0

Views: 1161

Answers (1)

Smokrow
Smokrow

Reputation: 241

Depending on your dataset you could try using the tf dataset apiThis means you could try converting your dataset into a tfrecord or you could try csv files. The datasetapi takes care of loading your data in the background while you are training other data. This speeds up the training process significantly as well

Upvotes: 1

Related Questions