neolaser
neolaser

Reputation: 6907

Django-Haystack with Solr contains search

I am using haystack within a project using solr as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains="...")

The __startswith option does not suit our needs as it, as the name suggests, looks for words that start with the string.

I tried to use something like *keyword* but Solr does not allow the * to be used as the first character

Thanks.

Upvotes: 8

Views: 3023

Answers (4)

Nahn
Nahn

Reputation: 3256

None of the answers here do a real substring search *keyword*.

They don't find the keyword that is part of a bigger string, (not a prefix or suffix).

Using EdgeNGramFilterFactory or the EdgeNgramField in the indexes can only do a "startswith" or a "endswith" type of filtering.

The solution is to use a NgramField like this:

class MyIndex(indexes.SearchIndex, indexes.Indexable):
    ...
    field_to_index= indexes.NgramField(model_attr='field_name')
    ...

This is very elegant, because you don't need to manually add anything to the schema.xml

Upvotes: 0

Facundo Olano
Facundo Olano

Reputation: 2609

You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.

Upvotes: 2

HolgT
HolgT

Reputation: 663

I am using an expression like: .filter(something__startswith='...') .filter_or(name=''+s'...') as is seems solr does not like expression like '...*', but combined with or will do

Upvotes: 0

lindstromhenrik
lindstromhenrik

Reputation: 1143

To get "contains" functionallity you can use:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />

as index analyzer.

This will create ngrams for every whitespace separated word in your field. For example:

"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!

As you see this will expand your index greatly but if you now enter a query like:

"nde*"

it will match "ndex" giving you a hit.

Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.

Upvotes: 10

Related Questions