Soufiane Roui
Soufiane Roui

Reputation: 679

Off-line clustering using solr?

I want to cluster my indexed data in solr. Each solr document contains the following fields : id, title, url.

I have read solr 7.7 docs and the clustering algorithm mentioned there is applied only to the search result of each single query. And my need is a full index clustering based on the document title.

Anyone could help?

Upvotes: 0

Views: 256

Answers (2)

rscavilla
rscavilla

Reputation: 66

Results clustering was removed in solr 8.x. The reason sited on the solr website was “The search results clustering contrib (Carrot2) has been removed from 8.x Solr due to lack of Java 1.8 compatibility in the dependency that provides online clustering of search results.”

Here is how I got it to work on JVM 11. All necessary files can be downloaded from this Github repo!

  1. Follow the instructions for installing the clustering contrib: https://solr.apache.org/guide/8_1/result-clustering.html
  2. Add solr-clustering-8.7.0.jar to /solr-8.x.x/dist directory (I tested this jar up to Solr version 8.11.1)
  3. Create /solr-8.x.x/contrib/clustering directory and copy the files in marked for contrib
  4. restart solr

Tested with java 11

Upvotes: 0

Stanislaw Osinski
Stanislaw Osinski

Reputation: 1231

As far as I'm aware, there's no out-of-the-box plugin for clustering the whole Solr index.

If you have some background in machine learning, have a look at Apache Mahout, it should be suitable for clustering a dataset of this size. Alternatively, there's a commercially-licensed Carrot2 spin-off we develop called Lingo4G, which is designed for clustering large collections of text. In both cases, however, there is no direct integration with Solr -- you'd need to handle the integration on your own.

Upvotes: 2

Related Questions