Reputation: 708
Saw this relatively old blogpost regarding Cloudant's search feature.
Was looking for answers to some queries, since we use the Cloudant Heroku Addon & need to support Search queries:
Is it possible to store the Lucene-based search-indexes WITHIN the CouchDB itself, so that if we replicate the DB (say, to a Couchbase/CouchDB on mobile devices), then the index data also comes with it?
Will indexing work on replicated CouchDB databases or ONLY on Cloudant?
What if we have PDF's stored as attachments in CouchDB documents? Is there support to index and search such fields out-of-the-box? Should we parse the PDF's & write our own Analyzers which we then import into Cloudant?
What is the best possible approach if we would like to support Searching the contents of PDF 'attachments' of CouchDB which gets replicated from Cloudant to local CouchDB instances on mobile devices?
Would be great if anyone could provide some pointers for achieving these via Cloudant.
I do know there are some alternatives like CouchDB-Lucene as mentioned here.
But since we are using Cloudant as the central CouchDB, was curious to know if this could be done easily.
Thanks
Upvotes: 1
Views: 406
Reputation: 1836
Is it possible to store the Lucene-based search-indexes WITHIN the CouchDB itself, so that if we replicate the DB (say, to a Couchbase/CouchDB on mobile devices), then the index data also comes with it?
The search indexes on Cloudant are always stored outside the database. Like view data, they will not be replicated. Otherwise, we couldn't use Lucene's highly optimised on-disk format.
Will indexing work on replicated CouchDB databases or ONLY on Cloudant?
Search indexing will only work on Cloudant (using the "indexes" field in a design doc). You would need a separate solution for the mobile device or replicated vanilla-CouchDB instance.
What if we have PDF's stored as attachments in CouchDB documents? Is there support to index and search such fields out-of-the-box? Should we parse the PDF's & write our own Analyzers which we then import into Cloudant?
Currently, you need to parse the text from the PDF yourself, using something like Tika, and store that in a field within your document which is then indexed by search. Custom analysers are unlikely to be supported, though support for indexing binary files may arrive at some point.
What is the best possible approach if we would like to support Searching the contents of PDF 'attachments' of CouchDB which gets replicated from Cloudant to local CouchDB instances on mobile devices?
It depends on the platform. As you'd have already parsed out the text of the PDF for use in Cloudant's search, you could use the local search APIs on the device. Unfortunately these are a bit thin on the ground as yet, and I've not yet had time to try any for myself. It's a shame SearchKit is not available on iOS as yet, so far as I can tell.
Or you could search online using Cloudant's search, then pull the documents from the local database as you'd have the doc IDs.
Upvotes: 3