Reputation: 9345
I'm using the tf.data.Dataset
API and have a dataset that's ~500K rows and ~1,800 columns. When I try:
dataset = tf.data.Dataset.from_tensor_slices(({"reviews": data}, labels))
I get back:
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
I've googled around and seen a lot of people run into this issue but no satisfactory answers. Is there a way to get around this limit or a tensorflow approach that will break up my dataset
? I already batch it but that happens after calling:
dataset = tf.data.Dataset.from_tensor_slices(({"reviews": data}, labels))
For what it's worth, my code to read the data from CSV into a tf.data.Dataset
works when I use 10% of the data.
Any suggestions would be awesome!
Upvotes: 0
Views: 1161
Reputation: 241
Depending on your dataset you could try using the tf dataset apiThis means you could try converting your dataset into a tfrecord or you could try csv files. The datasetapi takes care of loading your data in the background while you are training other data. This speeds up the training process significantly as well
Upvotes: 1