Reputation: 679
I want to cluster my indexed data in solr. Each solr document contains the following fields : id, title, url.
I have read solr 7.7 docs and the clustering algorithm mentioned there is applied only to the search result of each single query. And my need is a full index clustering based on the document title.
Anyone could help?
Upvotes: 0
Views: 256
Reputation: 66
Results clustering was removed in solr 8.x. The reason sited on the solr website was “The search results clustering contrib (Carrot2) has been removed from 8.x Solr due to lack of Java 1.8 compatibility in the dependency that provides online clustering of search results.”
Here is how I got it to work on JVM 11. All necessary files can be downloaded from this Github repo!
Tested with java 11
Upvotes: 0
Reputation: 1231
As far as I'm aware, there's no out-of-the-box plugin for clustering the whole Solr index.
If you have some background in machine learning, have a look at Apache Mahout, it should be suitable for clustering a dataset of this size. Alternatively, there's a commercially-licensed Carrot2 spin-off we develop called Lingo4G, which is designed for clustering large collections of text. In both cases, however, there is no direct integration with Solr -- you'd need to handle the integration on your own.
Upvotes: 2