user7014786
user7014786

Reputation:

Spring Data Elasticsearch only returning exact matches

I made a simple project to test free text search using Spring Data Elasticsearch. Here is my entity:

@Document(indexName = "flag", type = "flag")
public class Flag {

    @Id
    private String flagCode;

    @MultiField(mainField = @Field(type = FieldType.String))
    private String flagName;

    // setters and getters
}

Now, my search operation looks like:

SearchQuery searchQuery = new NativeSearchQueryBuilder()
            .withQuery(fuzzyQuery("flagName", freeText))
            .build();
List<Flag> flags =  elasticsearchTemplate.queryForList(searchQuery, Flag.class);

With this, I am only getting objects whose flagName is exact match to flagName where as I want that if at least 2 characters match, it should be a match.

If relevant, my DataConfig file looks like:

@Configuration
@EnableElasticsearchRepositories(basePackages = "com.shubham.dao")
@ComponentScan(basePackages = "com.shubham.data")
public class DataConfig {

    @Value("${elasticsearch.clustername}")
    private String clusterName;

    @Value("${elasticsearch.host}")
    private String elasticsearchHost;

    @Value("${elasticsearch.port}")
    private int elasticsearchPort;

    @Bean
    public Client client() throws Exception {

    Settings esSettings = Settings.settingsBuilder()
            .put("cluster.name", clusterName)
            .build();

    //https://www.elastic.co/guide/en/elasticsearch/guide/current/_transport_client_versus_node_client.html
    return TransportClient.builder()
            .settings(esSettings)
            .build()
            .addTransportAddress(
                    new InetSocketTransportAddress(InetAddress.getByName(elasticsearchHost), elasticsearchPort));
    }

    @Bean
    public ElasticsearchOperations elasticsearchTemplate() throws Exception {
        return new ElasticsearchTemplate(client());
    }
}

What am I possibly doing wrong? I am newbie to Elasticsearch.

Upvotes: 1

Views: 2134

Answers (1)

Val
Val

Reputation: 217254

By default, the fuzzy query has a fuzziness setting of AUTO, which means that

  • if the term is between 0 to 2 characters long, it must match exactly.
  • if the term is between 3 to 5 characters long, only one edit is allowed
  • above that, two edits are allowed

In your case, when you index India, the token india (lowercase) will be indexed.

  • Searching for In implies 4 edits => no match
  • Searching for Ind implies 3 edits => no match
  • Searching for Indi implies 2 edits => no match
  • Searching for India implies 1 edit => match

You probably need to use edge-ngram to tokenize your data instead of using fuzziness which is not at all suited to search for prefixes like you seem to be willing to do.

Upvotes: 1

Related Questions