Reputation: 17
How can i upload the MS docs (having extension .docx. .xls etc ) / .pdf and search the word from these file using the Java API.
I have tried below to upload the docx file
InputStream docStream = Example.class.getClassLoader().getResourceAsStream(
"data"+File.separator+"Resume.docx");
GenericDocumentManager manager = client.newDocumentManager();
DocumentMetadataHandle handleMetaData = new DocumentMetadataHandle();
// create a handle on the content
InputStreamHandle handle = new InputStreamHandle(docStream);
// write the document content
manager.write("/example/resume.docx", handleMetaData, handle);
To search i have tried below,
GenericDocumentManager manager = client.newDocumentManager();
StringQueryDefinition query =
queryMgr.newStringDefinition().withCriteria("pavan");
DocumentPage documents = manager.search(query, 1);
while (documents.hasNext()) {
DocumentRecord document = documents.next();
System.out.println("document" + document.getContent(new StringHandle()));
}
Please help me same with logic and code.
Upvotes: 1
Views: 84
Reputation: 11214
In this case you'd have to apply some conversion. MarkLogic stores binary documents as binary nodes (in this case binary documents are what you're referring to - pdf, docx etc). Binary nodes are of course not searchable. There are quite a few ways that you can achieve conversion:
I hope these resources will help you out. Further to this you can also attend a Developer or Admin training where these concepts are explained, more info on that here: http://www.marklogic.com/training/
Upvotes: 2