MaatDeamon
MaatDeamon

Reputation: 9771

Browse functionality Indexing strategy

In Version 1.8.x, could someone explain what indexing framework/techniques is used for the Browse index. That is, what is used ? SOLR, the Database Directly ? A cache.

It seems that there is a difference when discovery is activated (As per the documentation). There, Solr is suppose to be used.

Hence my questions here are:

  1. In the first scenario (without discovery) What does the Browse functionality use to index the metadata or whatever else it index ?

  2. In the Second Scenario (with discovery) same question ?

  3. Finally, if someone could point me to the configuration or the class file involved that would be great. Especially I'm curious to know how does Adding Discovery changes the indexing strategy of the Browse functionality and maybe the search itself as well. I can investigate things myself, but having a big picture and some indication as to where to look at would be such a boost.

Upvotes: 0

Views: 202

Answers (1)

terrywb
terrywb

Reputation: 3956

In DSpace 1.x and 3.x, the browse index was a Lucene index stored in [dspace-install]/search. Metadata from the database and full text content from bitstreams are extracted into Lucene.

If the Discovery index is not enabled, then the search box will query Lucene. If the Discovery index is enabled, the search box queries SOLR.

In DSpace, the Lucene index has several limitations compared to the Lucene index. The SOLR index is aware of user authorizations and presents authorization-aware results. The Lucene index may return items that a user cannot access.

The following link should point you to the configuration files: https://wiki.duraspace.org/display/DSDOC18/Discovery

Upvotes: 1

Related Questions