Reputation: 231
We have bunch of pdf documents available in EMC Documentum We have a requirement we have to integrate Apache solr with Documentum, so that we can search for a specific document in Solr, and we can get the documents from Documentum
I looked into below link which is not sufficient information https://community.emc.com/docs/DOC-6520
Help is really appriciated
Upvotes: 0
Views: 714
Reputation: 5565
I have built my own connecter to extract data from Documentum and insert in Elasticsearch or solr and I am willing to share. please contact me
Upvotes: 0
Reputation: 9500
The link you have posted would get you a working solution. That author proposes to write a custom crawler that connects to the Documentum repository and then use Apache Tika to perform the content extraction for Solr.
However I would suggest you to use
Apache ManifoldCF is an effort to provide an open source framework for connecting source content repositories like Microsoft Sharepoint and EMC Documentum, to target repositories or indexes, such as Apache Solr, Open Search Server, or ElasticSearch. Apache ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Upvotes: 1