Claudia Guirao
Claudia Guirao

Reputation: 335

Searching with numbers - python - whoosh

I have indexed all my documents with a schema like this:

ID = ID(stored=True)
Body = TEXT(analyzer=StemmingAnalyzer(), stored=False,field_boost=4.0)
Name = TEXT(stored=True, field_boost=5.0)
Brand= TEXT(StemmingAnalyzer(),stored=False, field_boost=4.0)
...

My search module looks like this:

qp = MultifieldParser(["Name", "Body", "Brand", 
"Familia","Superpadre","Tags","ID"], schema=ix.schema)

But when I search for iphone 6, it is querying like this:

<Top 20 Results for Or([Term('Name', u'iphone'), Term('Body',
 u'iphon'), Term('Brand', u'iphon'), Term('Familia', u'iphon'), 
Term('Superpadre', u'iphon'), And([Term('Tags', u'iphone'),  
Term('Tags', u'6')]), Term('ID', u'iphon')]) runtime=0.0327291488647>

It is only searching for the digit 6 in the TAGS, but not in the name, brand, etc.

Could you please help me to search it also in the other fields?

Thank you all in advance.

Upvotes: 2

Views: 537

Answers (2)

Assem
Assem

Reputation: 12107

All words with single-character is considered as stop words in Whoosh by default and ignored. This means all letters and digits are ignored.

stop words are words which are filtered out before or after processing of natural language data (text). (ref)

You can check that StopFilter has a minsize = 2 by default added to pre-defined set.

class whoosh.analysis.StopFilter(
        stoplist=frozenset(['and', 'is', 'it', 'an', 'as', 'at', 'have', 'in', 'yet', 'if', 'from', 'for', 'when', 'by', 'to', 'you', 'be', 'we', 'that', 'may', 'not', 'with', 'tbd', 'a', 'on', 'your', 'this', 'of', 'us', 'will', 'can', 'the', 'or', 'are']),
        minsize=2,
        maxsize=None,
        renumber=True,
        lang=None
        )

So You can resolve this issue by redefining your schema and removing the StopFilter or using it with minsize = 1:

from whoosh.analysis import StemmingAnalyzer
schema = Schema(content=TEXT(analyzer=StemmingAnalyzer(stoplist=None)))

or

schema = Schema(content=TEXT(analyzer=StemmingAnalyzer(minsize=1)))

Upvotes: 2

Claudia Guirao
Claudia Guirao

Reputation: 335

Solved with this parameter in my schema

StemmingAnalyzer(minsize=1)

Upvotes: 0

Related Questions