Reputation: 3650
I would like to read pdf files into the pipeline. However, I haven't found any apache beam example regarding file formats other than plain text or xml.
Upvotes: 0
Views: 597
Reputation: 191
There is no pre-existing PDF reader available in Dataflow or Apache Beam libraries. However, you could use the example of this reader for TensorFlow records as a model to write your own using the PDF parsing library of your choice.
Upvotes: 1