TajyMany
TajyMany

Reputation: 537

Document Conversion Watson service not working?

I've been trying to use the IBM Watson Document Conversion service with the demo PDF, but it's not converting the document into little bits. All it's doing, is creating 1 answer unit, that's really long:

"text": "Watson is an artificially intelligent computer system capable of answering questions posed in natural language,[2] developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first CEO and industrialist Thomas J. Watson.[3][4] The computer system was specifically developed to answer questions on the quiz show Jeopardy![5] In 2011, Watson competed on Jeopardy! against former winners Brad Rutter and Ken Jennings.[3][6] Watson received the first place prize of $1 million.[7] Watson had access to 200 million pages of structured and unstructured content consuming four terabytes of disk storage[8] including the full text of Wikipedia,[9] but was not connected to the Internet during the game.[10][11] For each clue, Watson's three most probable responses were displayed on the television screen. Watson consistently outperformed its human opponents on the game's signaling device, but had trouble responding to a few categories, notably those having short clues containing only a few words. In February 2013, IBM announced that Watson software system's first commercial application would be for utilization management decisions in lung cancer treatment at Memorial Sloan- Kettering Cancer Center in conjunction with health insurance company WellPoint.[12] IBM Watson's former business chief Manoj Saxena says that 90% of nurses in the field who use Watson now follow its guidance.[13]"

Thanks in advance!

Upvotes: 3

Views: 298

Answers (1)

Matt F
Matt F

Reputation: 712

Unfortunately, that demo PDF is not the best document to use: Currently, Answer Units are split based on heading tags (h1 - h6), and that PDF doesn't contain any headers. =(

If you set the conversion_target to NORMALIZED_HTML, you'll be able to see the converted PDF before it is split up into Answer Units. It will contain paragraphs but no headings.

In the future, we expect to also allow splitting Answer Units by paragraph, but that hasn't been released yet.

UPDATE: We updated the PDF on the demo site with one that's a much better example.

Upvotes: 6

Related Questions