Spring Data Elasticsearch only returning exact matches

Question

I made a simple project to test free text search using Spring Data Elasticsearch. Here is my entity:

@Document(indexName = "flag", type = "flag")
public class Flag {

    @Id
    private String flagCode;

    @MultiField(mainField = @Field(type = FieldType.String))
    private String flagName;

    // setters and getters
}

Now, my search operation looks like:

SearchQuery searchQuery = new NativeSearchQueryBuilder()
            .withQuery(fuzzyQuery("flagName", freeText))
            .build();
List flags =  elasticsearchTemplate.queryForList(searchQuery, Flag.class);

With this, I am only getting objects whose flagName is exact match to flagName where as I want that if at least 2 characters match, it should be a match.

If relevant, my DataConfig file looks like:

@Configuration
@EnableElasticsearchRepositories(basePackages = "com.shubham.dao")
@ComponentScan(basePackages = "com.shubham.data")
public class DataConfig {

    @Value("${elasticsearch.clustername}")
    private String clusterName;

    @Value("${elasticsearch.host}")
    private String elasticsearchHost;

    @Value("${elasticsearch.port}")
    private int elasticsearchPort;

    @Bean
    public Client client() throws Exception {

    Settings esSettings = Settings.settingsBuilder()
            .put("cluster.name", clusterName)
            .build();

    //https://www.elastic.co/guide/en/elasticsearch/guide/current/_transport_client_versus_node_client.html
    return TransportClient.builder()
            .settings(esSettings)
            .build()
            .addTransportAddress(
                    new InetSocketTransportAddress(InetAddress.getByName(elasticsearchHost), elasticsearchPort));
    }

    @Bean
    public ElasticsearchOperations elasticsearchTemplate() throws Exception {
        return new ElasticsearchTemplate(client());
    }
}

What am I possibly doing wrong? I am newbie to Elasticsearch.

Val · Accepted Answer

By default, the fuzzy query has a fuzziness setting of AUTO, which means that

if the term is between 0 to 2 characters long, it must match exactly.
if the term is between 3 to 5 characters long, only one edit is allowed
above that, two edits are allowed

In your case, when you index India, the token india (lowercase) will be indexed.

Searching for In implies 4 edits => no match
Searching for Ind implies 3 edits => no match
Searching for Indi implies 2 edits => no match
Searching for India implies 1 edit => match

You probably need to use edge-ngram to tokenize your data instead of using fuzziness which is not at all suited to search for prefixes like you seem to be willing to do.

Spring Data Elasticsearch only returning exact matches

Answers (1)

Related Questions