Code Junkie
Code Junkie

Reputation: 7788

How to add last two digits of year to hibernate search / lucene index

In my database I store years in it's complete form. Example, 2012,2013,2014 etc. This is also how they are being stored in my index. I'm looking to also store the last two digits in the index as well. Example 12,13,14 etc. I basically want to enable individuals to be able to do a keyword search on 2012 and 12.

My main search analyzer looks like this.

@AnalyzerDefs({
    @AnalyzerDef(name = "searchtokenanalyzer",
            // Split input into tokens according to tokenizer
            tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
            filters = {
                @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                    @Parameter(name = "pattern", value = "([^a-zA-Z0-9\\-])"),
                    @Parameter(name = "replacement", value = ""),
                    @Parameter(name = "replace", value = "all")}),
                @TokenFilterDef(factory = StopFilterFactory.class),
                @TokenFilterDef(factory = TrimFilterFactory.class)
            }),

I have a second analyzer for handling the year abbreviation that looks like this.

@AnalyzerDef(name = "yearanalyzer",
            // Split input into tokens according to tokenizer
            // Split input into tokens according to tokenizer
            tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
            filters = {
                @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                    @Parameter(name = "pattern", value = "^.{2}"),
                    @Parameter(name = "replacement", value = ""),
                    @Parameter(name = "replace", value = "all")}),
                @TokenFilterDef(factory = StopFilterFactory.class),
                @TokenFilterDef(factory = TrimFilterFactory.class)
            })

And on my entity field I have the following.

@Entity
@Indexed
public class YearLookup 
    @Fields({
            @Field(name = "name", store = Store.NO, index = Index.YES,
                    analyze = Analyze.YES, analyzer = @Analyzer(definition = "searchtokenanalyzer")),
            @Field(name = "abbr", store = Store.NO, index = Index.YES, 
                    analyze = Analyze.YES, analyzer = @Analyzer(definition = "yearanalyzer"))
        })
        private String name;
    }

Now so far everything is making in the index correctly, I can see

name 2012,2013,2014
abbr 12,13,14

Now when I do a search against against YearLookup.class with the following code. The abbr year gets cut down by two digits again creating a null value while name remains in tact.

public interface SearchParam {
    public static final String[] SEARCH_FIELDS = new String[]{"yearLookup.name", "yearLookup.abbr"};
}

String searchString = "14";

QueryBuilder queryBuilder = fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(YearLookup.class).get();

ermMatchingContext onWildCardFields = queryBuilder.keyword().wildcard().onField(SearchParam.SEARCH_FIELDS[0]);
            TermMatchingContext onFuzzyFields = queryBuilder.keyword().fuzzy().withThreshold(0.7f)
                    .withPrefixLength(1).onField(SearchParam.SEARCH_FIELDS[0]);

            //Iterate over all the remaining search fields stored in the "VehicleListing" index 
            for (int i = 1; i < SearchParam.SEARCH_FIELDS.length; i++) {
                onWildCardFields.andField(SearchParam.SEARCH_FIELDS[i]);
                onFuzzyFields.andField(SearchParam.SEARCH_FIELDS[i]);
            }

            String[] tokens = searchString.toLowerCase().split("\\s");

            for (String token : tokens) {
                luceneQuery = queryBuilder.bool()
                        .should(onWildCardFields.matching(token + "*").createQuery())
                        .should(onFuzzyFields.matching(token).createQuery())
                        .createQuery();
            }

FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(luceneQuery, YearLookup.class);

Integer results = fullTextQuery.getResultSize();

Now when I run my test case against this. I get the following exception.

HSEARCH000146: The query string '14' applied on field 'yearLookup.abbr' has no meaningfull tokens to be matched. Validate the query input against the Analyzer applied on this field. org.hibernate.search.errors.EmptyQueryException at org.hibernate.search.query.dsl.impl.ConnectedMultiFieldsTermQueryBuilder.createQuery(ConnectedMultiFieldsTermQueryBuilder.java:111) at org.hibernate.search.query.dsl.impl.ConnectedMultiFieldsTermQueryBuilder.createQuery(ConnectedMultiFieldsTermQueryBuilder.java:86) at com.domain.auto.services.search.impl.SearchManagerImpl.doSearch(SearchManagerImpl.java:146) at $SearchManager_138fdc525111b303.doSearch(Unknown Source) at $SearchManager_138fdc525111b2f3.doSearch(Unknown Source) at com.domain.auto.services.search.impl.SearchServiceImplTest.testYearSearch(SearchServiceImplTest.java:92)

Anybody have any thoughts?

Upvotes: 0

Views: 502

Answers (2)

Atul Kumar
Atul Kumar

Reputation: 749

Create a bridge and handle String for both case as below:

 @FieldBridge(impl = YearFieldBridge.class)
 private String name;

And create bridge class some how similer to this:

public class YearFieldBridge implements StringBridge, Serializable {
    private static final long serialVersionUID = 1L;
    @Override
    public String objectToString(Object value) {
        if(value != null) {
            if(value instanceof String) {
                String strVal = (String) value;
                strVal = strVal.toUpperCase();
                if(strVal.length() == 2){
                    return "20"+strVal;
                }else{
                    return strVal;
                }
            }
        }
        return null;
    }
}

Upvotes: 0

Code Junkie
Code Junkie

Reputation: 7788

Solution

@AnalyzerDef(name = "yearanalyzer",
        // Split input into tokens according to tokenizer
        // Split input into tokens according to tokenizer
        tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
        filters = {
            @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                @Parameter(name = "pattern", value = "^\\d{2}(\\d{2})$"),
                @Parameter(name = "replacement", value = "$1"),
                @Parameter(name = "replace", value = "all")}),
        })

Upvotes: 0

Related Questions