Ashish Pancholi
Ashish Pancholi

Reputation: 4659

How to match exact value in elasticsearch?

I have indexed the meta-data of three files and they are of "text/plain" mime-Types.

But when I am trying to match other mime-types with "text/plain", following are getting matched!

Here is the list of mime-types that are matching with "text/plain", with hits & score:

***********************************
1. Mime-Type text/vnd.motorola.reflex
2. Total Hits 3
3. Max Score 0.07154637
***********************************
1. Mime-Type text/vnd.ms-mediapackage
2. Total Hits 3
3. Max Score 0.034633614
***********************************
1. Mime-Type text/vnd.net2phone.commcenter.command
2. Total Hits 3
3. Max Score 0.07154637
***********************************
1. Mime-Type text/plain
2. Total Hits 3
3. Max Score 0.629606
***********************************

I want that mime-type should exact match and should consider only last one. If you notice it's giving Max-score greater then all above.

Search Code:

query = "text/plain"; filter = "mimeType"

public long getHitsCount(String query, String filter, Project project) {
        try {
            /*TermQueryBuilder QueryBuilder =  new TermQueryBuilder(filter, smartEscapeQuery(query));*/
           /* QueryStringQueryBuilder QueryBuilder = new QueryStringQueryBuilder(smartEscapeQuery(query)).field(filter);*/
            MatchQueryBuilder QueryBuilder = QueryBuilders.matchQuery(filter, smartEscapeQuery(query));
            QueryBuilder qb = QueryBuilders
                    .boolQuery()
                    .must(QueryBuilder);

            SearchRequestBuilder requestBuilder;
                requestBuilder = client.prepareSearch()
                        .setIndices(getDomainIndexId(project))
                        .setTypes(getProjectTypeId(project))
                        .setSearchType(SEARCH_TYPE)
                        .setQuery(qb);


            SearchResponse response = requestBuilder.execute().actionGet(ES_TIMEOUT_MS);
            SearchHits hits = response.getHits();
            if (hits.getTotalHits() > 0) {
                return hits.getTotalHits();
            }else{
                return 0l;
            }
        } catch (IndexMissingException ex) {

        }
       return 0;
    } 

/**
     * Escape the string from bad chars for the search
     *
     * @param str the String that should be escaped
     * @return an escaped String
     */
    @SuppressWarnings({"ConstantConditions"})
    private static String smartEscapeQuery(String str) {
        if (StringUtils.isBlank(str)) {
            return "";
        }

        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < str.length(); i++) {
            char c = str.charAt(i);
            if (c == '\\' || c == '+' || c == '-' || c == '!' || c ==
                    '(' || c == ')' || c == ':'
                    || c == '^' || c == '[' || c == ']' || c == '\"'
                    || c == '{' || c == '}' || c == '~' || c == '/'
                    || c == '?' || c == '|' || c == '&' || c == ';'
                    || (!Character.isSpaceChar(c) &&
                    Character.isWhitespace(c))) {
                sb.append('\\');
            }
            sb.append(c);
        }
        return sb.toString();
    }

Match Query:

    {
      "bool" : {
        "must" : {
          "match" : {
            "mimeType" : {
              "query" : "text\\/plain",
              "type" : "boolean"
            }
          }
        }
      }
    }
Result: 3 Hits

Term Query:

{
  "bool" : {
    "must" : {
      "term" : {
        "mimeType" : "text\\/plain"
      }
    }
  }
}

Result: 0 Hits

I have tried with both TermQuery & MatchQuery but it did not work. I am using AutoDetectParser while indexing.

How can I match the exact value in elasticsearch so that in above example it should only match with the "text/plain" NOT with matching ones?

Upvotes: 0

Views: 1504

Answers (1)

sven.kwiotek
sven.kwiotek

Reputation: 1479

In your first example you have a query of type "match query". Therefore your query is analyzed before search (text OR plain). Which anlayzer you have used by indexing? Or could it be helpful to "not_analyzed" this field? In your second example you make use of type "term query". This also requires a "not_analyzed" field.

Upvotes: 1

Related Questions