Reputation: 991
I am using the latest version of elasticsearch (in docker) and a spring boot (latest version) app where I attempt to search for similar documents. My document class has a String field:
@Field(
name = "description",
type = FieldType.Text,
fielddata = true,
analyzer = "icu_analyzer",
termVector = TermVector.with_positions_offsets,
similarity = Similarity.BM25)
private String description;
I get plenty of results for my query when I use the built-in searchSimilar
method:
public Page<BookInfo> findSimilarDocuments(final long id) {
return bookInfoRepository.findById(id)
.map(bookInfo -> bookInfoRepository.searchSimilar(bookInfo, new String[]{"description"}, pageable))
.orElse(Page.empty());
}
However, I have no idea how similar the documents are, because it is just a page of my Document object. It would be great to be able to see the similarity score, or to set a similarity threshold when performing the query. Is there something different that I should be doing?
Upvotes: 0
Views: 1669
Reputation: 19421
I just had a look, the existing method Page<T> searchSimilar(T entity, @Nullable String[] fields, Pageable pageable)
was added to the ElasticsearchRepository
interface back in 2013, it just returns a Page<T>
which does not contain any score information.
Since Spring Data Elasticsearch version 4.0 the score information is available and when you look at the implementation you see that it is stripped from the return value of the function in order to adhere to the method signature from the interface:
public Page<T> searchSimilar(T entity, @Nullable String[] fields, Pageable pageable) {
Assert.notNull(entity, "Cannot search similar records for 'null'.");
Assert.notNull(pageable, "'pageable' cannot be 'null'");
MoreLikeThisQuery query = new MoreLikeThisQuery();
query.setId(stringIdRepresentation(extractIdFromBean(entity)));
query.setPageable(pageable);
if (fields != null) {
query.addFields(fields);
}
SearchHits<T> searchHits = execute(operations -> operations.search(query, entityClass, getIndexCoordinates()));
SearchPage<T> searchPage = SearchHitSupport.searchPageFor(searchHits, pageable);
return (Page<T>) SearchHitSupport.unwrapSearchHits(searchPage);
}
You could implement a custom repository fragment (see https://docs.spring.io/spring-data/elasticsearch/docs/4.2.6/reference/html/#repositories.custom-implementations) that provides it's own implementation of the method that returns a SearchPage<T>
:
public SearchPage<T> searchSimilar(T entity, @Nullable String[] fields, Pageable pageable) {
Assert.notNull(entity, "Cannot search similar records for 'null'.");
Assert.notNull(pageable, "'pageable' cannot be 'null'");
MoreLikeThisQuery query = new MoreLikeThisQuery();
query.setId(stringIdRepresentation(extractIdFromBean(entity)));
query.setPageable(pageable);
if (fields != null) {
query.addFields(fields);
}
SearchHits<T> searchHits = execute(operations -> operations.search(query, entityClass, getIndexCoordinates()));
SearchPage<T> searchPage = SearchHitSupport.searchPageFor(searchHits, pageable);
return searchPage;
}
A SearchPage<T>
is a page containing SearchHit<T>
instances; these contain the entity and the additional information like the score.
Upvotes: 1