Nate Saint
Nate Saint

Reputation: 11

Using IBM Watson Document Converter to parse PDFs

My task is to use IBM Watson to convert a PDF to a text file or any output that may be useful to my task.

The PDF is a purchase order created by a customer and sent to us in varying formats. The customer can create these purchase orders any way that they wish, and I must parse them.

I have tried just using the Document Converter with default settings and the output is all over the place.

Any advice to approach this would be great... maybe something along the lines of using the IBM Watson intelligence to better find required information in these purchase orders even when they are not defined with.

Thanks for any help.

Upvotes: 0

Views: 1034

Answers (1)

Sayuri Mizuguchi
Sayuri Mizuguchi

Reputation: 5330

You can easily see the API Reference documentation from IBM Developers to make sure for my answer.

I'll suppose that you using curl, but inside the links have some examples with Nodejs, Python, Java if you want. But the conditions of use are practically the same.

Check example convert method with CURL:

curl -X POST -u "{username}":"{password}" -F config="{\"conversion_target\":\"answer_units\"}" -F "[email protected]" "https://gateway.watsonplatform.net/document-conversion/api/v1/convert_document?version=2015-12-15"

Inside file, you will choose the format from your file, example: PDF To build your own conversion, in the cURL, replace the file being called with your own PDF, HTML, or Word document file, and replace the "conversion_target" inside config with the format you want to convert into. Valid values are "answer_units", "normalized_html", or "normalized_text".

  • You can see one example from IBM Developers inside GitHub here.

  • Fork this example here.

  • In the official documentation, you can see tutorial about convert documents with this service, check here.

Upvotes: 2

Related Questions