Reputation: 503
In a textual corpus, there are 50 textual documents that each document approximately is about 80 lines. I want to feed my corpus as an input to tensorflow, but I want to batch each document when system read each document? actually same as TfRecord that used for images I want to by using Tf.Data make batch each document in my corpus for reading it sequentially?
How can I solve this issue?
Upvotes: 0
Views: 470
Reputation: 5926
You can create a TextLineDataset that will contain the lines of your documents:
dataset = tf.data.TextLineDataset(['doc1.txt', 'doc2.txt', ...])
After you create the dataset, you can split the strings into batches using the batch
method and other methods of the Dataset class.
Upvotes: 2