ElasticSearch: Highlighting with Stemming

Question

I have read this question and attempted to understand the documentation here, but this is complicated.

The problem (I think):

[update 1]

I am using Scala for my code and interface with ES High Level Java API.

I have a stemming analyzer configured. If I search for responsibilities i get results for responsibilities and responsibility. That's great.

BUT

Only the documents with the term responsibilities return highlights. This is because the search is on the stemmed content , i.e., responsib. However, the highlight is against the unstemmed content. Hence, it finds responsibilities which was a search criteria, but not responsibility, which wasn't.

If I set the highlighter to highlight on the stemmed content, it returns nothing at all. I guess because it is comparing resonsib with responsibilities

Search

I an using the Java high level API. The problem is not the code itself. Currently, I am highlighting only the content field, returning only responsibilities. Highlighting content.english seems to return nothing

 private def buildHighlighter(): HighlightBuilder = {
    import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder
    val highlightBuilder = new HighlightBuilder
    val highlightContent = new HighlightBuilder.Field("content")
    highlightContent.highlighterType("unified")
    highlightBuilder.field(highlightContent)
    highlightBuilder

  }

Mapping (adumbrated)

{
	"settings": {
		"number_of_shards": 3,
		"analysis": {
			"filter": {
				"english_stop": {
					"type": "stop",
					"stopwords": "_english_"
				},
				"english_keywords": {
					"type": "keyword_marker",
					"keywords": []
				},
				"english_stemmer": {
					"type": "stemmer",
					"language": "english"
				},
				"english_possessive_stemmer": {
					"type": "stemmer",
					"language": "possessive_english"
				}
			},
			"analyzer": {
				"english": {
					"tokenizer": "standard",
					"filter": [
						"english_possessive_stemmer",
						"lowercase",
						"english_stop",
						"english_keywords",
						"english_stemmer"
					]
				}
			}
		}
	},
	"mappings": {
		"_doc": {
			"properties": {
				"title": {
					"type": "text",
          "fields": {
           "english": {
             "type":     "text",
              "analyzer": "english"
            }
          }
				},
				"content": {
          "type": "text",
           "fields": {
            "english": {
              "type":     "text",
               "analyzer": "english"
             }
          }
			
			}
		}
	}
}

[update 2]

Scala code to implement search:

def searchByField(indices: Seq[ESIndexName], terms: Seq[(String, String)], size: Int = 20): SearchResponse = {

    val searchRequest = new SearchRequest
    searchRequest.indices(indices.map(idx => idx.completeIndexName()): _*)
    searchRequest.source(buildTargetFieldsMatchQuery(terms, size))

    searchRequest.indicesOptions(IndicesOptions.strictSingleIndexNoExpandForbidClosed())

    client.search(searchRequest, RequestOptions.DEFAULT)
  }

and query is built as follows:

private def buildTargetFieldsMatchQuery(termsByField: Seq[(String, String)], size: Int): SearchSourceBuilder = {

    val query = new BoolQueryBuilder

    termsByField.foreach {
      case (field, term) =>

        if (field == "content") {
          logger.debug(field + " should have " + term)
          query.should(new MatchQueryBuilder(field+standardAnalyzer, term.toLowerCase))
          query.should(new MatchQueryBuilder(field, term.toLowerCase))
        }
        else if (field == "title"){
          logger.debug(field + " should have " + term)
          query.should(new MatchQueryBuilder(field+standardAnalyzer, term.toLowerCase())).boost
        }
        else {
          logger.debug(field + " should have " + term)
        query.should(new MatchQueryBuilder(field, term.toLowerCase))
      }

    }
    val sourceBuilder: SearchSourceBuilder = new SearchSourceBuilder()
    sourceBuilder.query(query)
    sourceBuilder.from(0)
    sourceBuilder.size(size)
    sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS))
    sourceBuilder.highlighter(buildHighlighter())

  }

ElasticSearch: Highlighting with Stemming

Answers (1)

Related Questions