brelian
brelian

Reputation: 503

Convert textual document to tf.data in tensorflow for reading sequentially

In a textual corpus, there are 50 textual documents that each document approximately is about 80 lines. I want to feed my corpus as an input to tensorflow, but I want to batch each document when system read each document? actually same as TfRecord that used for images I want to by using Tf.Data make batch each document in my corpus for reading it sequentially?

How can I solve this issue?

Upvotes: 0

Views: 470

Answers (1)

MatthewScarpino
MatthewScarpino

Reputation: 5926

You can create a TextLineDataset that will contain the lines of your documents:

dataset = tf.data.TextLineDataset(['doc1.txt', 'doc2.txt', ...])

After you create the dataset, you can split the strings into batches using the batch method and other methods of the Dataset class.

Upvotes: 2

Related Questions