Reputation: 1899
I wonder what is the best way to format .DOC documents for Retrieve and Rank web interface document uploader so it handles the answer splitting the best. (I am using https://watson-retrieve-and-rank.ng.bluemix.net )
We have to create a set of documents and I can't find any guide on how to reformat them (for example, if any text size, bold, ... for title, body of the answer, etc.) will improve the automated answer splitting. The team creating those documents is not able to prepare them in proper JSON format and some of the DOC files is parsed by the service as a one page answer without any splitting
Of course, maybe there is another tool I am missing for this task.
Thanks for any experience or links.
Upvotes: 1
Views: 229
Reputation: 2765
The detailed documentation is at https://www.ibm.com/watson/developercloud/doc/document-conversion/customizing.shtml#htmlau as the tooling is using the default settings for the Document Conversion service.
However, to summarise, the tooling will split Word documents at paragraphs where a style is used with the name "Heading N" where "N" is a number.
So this includes the existing default built-in styles in MS Word (i.e. "Heading 1", "Heading 2", "Heading 3", "Heading 4", "Heading 5", "Heading 6", "Heading 7", "Heading 8", "Heading 9"). It also includes styles that you create with names like this (e.g. "Heading 123")
Upvotes: 3