Reputation:
I made a simple project to test free text search
using Spring Data Elasticsearch. Here is my entity:
@Document(indexName = "flag", type = "flag")
public class Flag {
@Id
private String flagCode;
@MultiField(mainField = @Field(type = FieldType.String))
private String flagName;
// setters and getters
}
Now, my search operation looks like:
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(fuzzyQuery("flagName", freeText))
.build();
List<Flag> flags = elasticsearchTemplate.queryForList(searchQuery, Flag.class);
With this, I am only getting objects whose flagName
is exact match to flagName
where as I want that if at least 2 characters match, it should be a match.
If relevant, my DataConfig
file looks like:
@Configuration
@EnableElasticsearchRepositories(basePackages = "com.shubham.dao")
@ComponentScan(basePackages = "com.shubham.data")
public class DataConfig {
@Value("${elasticsearch.clustername}")
private String clusterName;
@Value("${elasticsearch.host}")
private String elasticsearchHost;
@Value("${elasticsearch.port}")
private int elasticsearchPort;
@Bean
public Client client() throws Exception {
Settings esSettings = Settings.settingsBuilder()
.put("cluster.name", clusterName)
.build();
//https://www.elastic.co/guide/en/elasticsearch/guide/current/_transport_client_versus_node_client.html
return TransportClient.builder()
.settings(esSettings)
.build()
.addTransportAddress(
new InetSocketTransportAddress(InetAddress.getByName(elasticsearchHost), elasticsearchPort));
}
@Bean
public ElasticsearchOperations elasticsearchTemplate() throws Exception {
return new ElasticsearchTemplate(client());
}
}
What am I possibly doing wrong? I am newbie to Elasticsearch.
Upvotes: 1
Views: 2134
Reputation: 217254
By default, the fuzzy
query has a fuzziness setting of AUTO
, which means that
In your case, when you index India
, the token india
(lowercase) will be indexed.
In
implies 4 edits => no matchInd
implies 3 edits => no matchIndi
implies 2 edits => no matchIndia
implies 1 edit => matchYou probably need to use edge-ngram to tokenize your data instead of using fuzziness which is not at all suited to search for prefixes like you seem to be willing to do.
Upvotes: 1