Dave
Dave

Reputation: 1450

Using SOLR Autocomplete for multiple terms (i.e. comma-separated locations)

I've got SOLR up and running, indexing data via the DIH, and properly returning results for queries. I'm trying to setup another core to run suggester, in order to autocomplete geographical locations. We have a web application that needs to take a city, state / region, country input. We'd like to do this in a single entry box. Here are some examples:

Brooklyn, New York, United States of America
Philadelphia, Pennsylvania, United States of America
Barcelona, Catalunya, Spain

Assume for now that every location around the world can be split into this 3-form input. I've setup my DIH to create a TemplateTransformer field that combines the 4 tables (city, state and country are all independent tables connected to each other by a master places table) into a field called "fullplacename":

<field column="fullplacename" template="${city_join.plainname},
${region_join.plainname}, ${country_join.plainname}"/>

I've defined a "text_auto" field in schema.xml:

<fieldType class="solr.TextField" name="text_auto">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

and have defined these two fields as well:

<field name="name_autocomplete" type="text_auto" indexed="true" stored="true" multiValued="true" />
<copyField source="fullplacename" dest="name_autocomplete" />

Now, here's my problem. This works fine for the first term, i.e. if I type "brooklyn" I get the results I'd expect, using this URL to query:

http://localhost:8983/solr/places/suggest?q=brooklyn

However, as soon as I put a comma and/or a space in there, it breaks them up into 2 suggestions, and I get a suggestion for each:

http://localhost:8983/solr/places/suggest?q=brooklyn%2C%20ny

Gives me a suggestion for "brooklyn" and a suggestion for "ny" instead of a suggestion that matches "brooklyn, ny". I've tried every solution I can find via google and haven't had any luck. Is there something simple that I've missed, or is this the wrong approach?

Thanks!

EDIT: Just in case, here's the searchComponent and requestHandler definition:

<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">suggest</str>
        <str name="spellcheck.count">10</str>
    </lst>
    <arr name="components">
        <str>suggest</str>
    </arr>
</requestHandler>

<searchComponent name="suggest" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
        <str name="name">suggest</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
        <str name="field">name_autocomplete</str>`<br/>
    </lst>
</searchComponent>

Upvotes: 4

Views: 7202

Answers (3)

nish
nish

Reputation: 7280

I feel the accepted answer is a bit too complex. An elegant way of doing it would be to use http://localhost:8983/solr/places/suggest?spellcheck.q=brooklyn in place of http://localhost:8983/solr/places/suggest?q=brooklyn. As mentioned here

Upvotes: 0

Risadinha
Risadinha

Reputation: 16666

You are using the KeywordTokenizer which will not create separate tokens for "Brooklyn", "NY" and "United States".

Your example queries do not look so much like autocomplete but more like regular searches.

Autocomplete query (IMHO) contains only partial terms:

http://localhost:8983/solr/places/suggest?q=brook

for type ahead lists. You want to use EdgeNGram for that: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory Most probably in combintation with StandardTokenizer and/or WordDelimiterFilterFactory.

For your query example:

http://localhost:8983/solr/places/suggest?q=brooklyn%2C%20ny

StandardTokenizer in combination with LowercaseFilter and dismax request handler with a good configuration of the mm parameter - restricting hits to those that contain all input terms - would work well, see: http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

Upvotes: 0

Okke Klein
Okke Klein

Reputation: 2549

The problem lies in the suggester. Like the spellchecker it tokenizes on whitespace.

http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3266140.html has a solution for this problem.

Upvotes: 2

Related Questions