H. Pauwelyn
H. Pauwelyn

Reputation: 14290

Lucene search doesn't work when I use spaces

My situation

I've created a search function and for this I created a new indexer and searcher for that. The problem is when I enter a search query with a white space in it. Example below.

Data

I've got this persons created and stands inside my index:

Person number First name Last name
1 Ilse Van de Burg
2 Devolder Marlijn

Search results

I've tried next queries:

Query number Term Actual result* Accepted result*
1 van 1 1
2 van de 1 1
3 ilse 1 1
4 van de burg 1
5 van de burg ilse 1
6 de 1 & 2 1 & 2
7 devolder 2 2
8 devolder marlijn 2
9 marijn devolder 2

* number of the person. if empty: nothing found or accepted

Question

Some queries are not what I accepted. How could I solve this?

My code

Here is my code I've made:

BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["PersonSearcher"];
ISearchCriteria searchCriteria = searcher.CreateSearchCriteria(BooleanOperation.Or);
ISearchCriteria query = searchCriteria.Field("lastname", term.MultipleCharacterWildcard()).Or()
                                      .Field("firstname", term.MultipleCharacterWildcard()).Or()
                                      .OrderBy("lastname", "firstname").Compile();
return searcher.Search(query);

Configurations update 1

Examine index

<IndexSet SetName="Artsen" IndexPath="~/App_Data/TEMP/ExamineIndexes/Artsen/">

  <IndexAttributeFields>
    <add Name="id" Type="int" />
    <add Name="nodeName" />
    <add Name="nodeTypeAlias" />
  </IndexAttributeFields>
  <IndexUserFields>
    <add Name="email" />
    <add Name="fax" />
    <add Name="naam" EnableSorting="true" />
    <add Name="onderzoeken" Type="int[]" />
    <add Name="specialismen" Type="int[]" />
    <add Name="subspecialismen" Type="int[]" />
    <add Name="telefoon" />
    <add Name="titel" EnableSorting="true" />
    <add Name="voornaam" EnableSorting="true" />
    <add Name="website" />
  </IndexUserFields>
  <IncludeNodeTypes>
    <add Name="arts" />
  </IncludeNodeTypes>
</IndexSet>

Examine settings (examine index provider):

<add name="ArtsenIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" supportUnpublished="false"
     supportProtected="true" indexSet="Artsen"
     analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>

Examine settings (examine search provider):

<add name="ArtsenSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" supportUnpublished="false"
     supportProtected="false" indexSet="Artsen" enableLeadingWildcard="true"
     analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>

Tried also update 2

I've also tried this and got the best results:

query = searchCriteria.GroupedOr(new List<string>() { "naam" }, term.MultipleCharacterWildcard(), term.Escape()).Or()
                      .GroupedOr(new List<string>() { "voornaam" }, term.MultipleCharacterWildcard(), term.Escape()).Or()
                      .GroupedOr(new List<string>() { "titel" }, term.MultipleCharacterWildcard(), term.Escape()).Or()
                      .OrderBy("naam", "voornaam").Compile();

When I do a ToString() of searchCriteria of query above and searched on van de burg, it gives me this:

{ SearchIndexType: , LuceneQuery: (naam:van de burg* (naam:van de burg)) (voornaam:van de burg* (voornaam:van de burg)) (titel:van de burg* (titel:van de burg)) }

The problem here is when I get two persons with the same last name. For example:

Person number First name Last name
3 Marc De Vadder
4 Freddy De vadder

Search results:

The results 1 till 9 are all good.

Query number Term Actual result* Accepted result*
10 de vadder 3 & 4 3 & 4
11 de vadder freddy 3 & 4 4
11 de vadder marc 3 & 4 3

* number of the person. if empty: nothing found or accepted

Upvotes: 1

Views: 1282

Answers (1)

Marcin Zajkowski
Marcin Zajkowski

Reputation: 1728

Looking at your results all is good as you're searching for term in First Name OR Last Name OR Title, so you're getting results which contain elements of phrase in those fields.

As Examine is not fully supporting phrase queries, my suggestion would be to create searchable field which will store all of those fields combined and build a query against this field where we'll be looking for exact terms from phrase (not the whole phrase itself). It might get tricky also as maybe you're not able to control order of the fields and results can get inconsistent too. Worth to play with it.

Sample code demonstrating this behaviour might be like this:

if (searchTerm.Contains(" "))
{
    string[] terms = searchTerm.Split(' ');
    examineQuery.And().GroupedOr(new List<string> { SearchableFieldToSearch }, terms);
}

Second option might be a separation of fields in the search form itself (separated inputs for the first name, last name and title - of course if it's possible) and building query with GroupedAnd operation.

criteria.GroupedAnd(new[] { "naam", "voornaam", "titel" }.ToList(), new[] { firstName, lastName, title });

You can read more about Grouped Operations in documentation here: https://github.com/Shazwazza/Examine/wiki/Grouped-Operations.

If none of the above will work, maybe it would be worth to build a query with custom boosting and just trim/strip out the results with score lower than expected.

Hope it will help you and point to the right direction. Share your results :)

Upvotes: 4

Related Questions