Pavan
Pavan

Reputation: 17

Marklogic Docx type (Other than txt , json and xml) document upload and search with in file

How can i upload the MS docs (having extension .docx. .xls etc ) / .pdf and search the word from these file using the Java API.

I have tried below to upload the docx file

InputStream docStream = Example.class.getClassLoader().getResourceAsStream(
            "data"+File.separator+"Resume.docx");



    GenericDocumentManager manager = client.newDocumentManager();

    DocumentMetadataHandle handleMetaData = new DocumentMetadataHandle();

    // create a handle on the content
    InputStreamHandle handle = new InputStreamHandle(docStream);

    // write the document content
    manager.write("/example/resume.docx", handleMetaData, handle);

To search i have tried below,

GenericDocumentManager manager = client.newDocumentManager();
    StringQueryDefinition query = 
            queryMgr.newStringDefinition().withCriteria("pavan");


    DocumentPage documents = manager.search(query, 1);
    while (documents.hasNext()) {
        DocumentRecord document = documents.next();
       System.out.println("document" + document.getContent(new StringHandle()));
    }

Please help me same with logic and code.

Upvotes: 1

Views: 84

Answers (1)

Tamas
Tamas

Reputation: 11214

In this case you'd have to apply some conversion. MarkLogic stores binary documents as binary nodes (in this case binary documents are what you're referring to - pdf, docx etc). Binary nodes are of course not searchable. There are quite a few ways that you can achieve conversion:

I hope these resources will help you out. Further to this you can also attend a Developer or Admin training where these concepts are explained, more info on that here: http://www.marklogic.com/training/

Upvotes: 2

Related Questions