Reputation: 335
I have indexed all my documents with a schema like this:
ID = ID(stored=True)
Body = TEXT(analyzer=StemmingAnalyzer(), stored=False,field_boost=4.0)
Name = TEXT(stored=True, field_boost=5.0)
Brand= TEXT(StemmingAnalyzer(),stored=False, field_boost=4.0)
...
My search module looks like this:
qp = MultifieldParser(["Name", "Body", "Brand",
"Familia","Superpadre","Tags","ID"], schema=ix.schema)
But when I search for iphone 6, it is querying like this:
<Top 20 Results for Or([Term('Name', u'iphone'), Term('Body',
u'iphon'), Term('Brand', u'iphon'), Term('Familia', u'iphon'),
Term('Superpadre', u'iphon'), And([Term('Tags', u'iphone'),
Term('Tags', u'6')]), Term('ID', u'iphon')]) runtime=0.0327291488647>
It is only searching for the digit 6 in the TAGS, but not in the name, brand, etc.
Could you please help me to search it also in the other fields?
Thank you all in advance.
Upvotes: 2
Views: 537
Reputation: 12107
All words with single-character is considered as stop words in Whoosh by default and ignored. This means all letters and digits are ignored.
stop words are words which are filtered out before or after processing of natural language data (text). (ref)
You can check that StopFilter
has a minsize = 2
by default added to pre-defined set.
class whoosh.analysis.StopFilter(
stoplist=frozenset(['and', 'is', 'it', 'an', 'as', 'at', 'have', 'in', 'yet', 'if', 'from', 'for', 'when', 'by', 'to', 'you', 'be', 'we', 'that', 'may', 'not', 'with', 'tbd', 'a', 'on', 'your', 'this', 'of', 'us', 'will', 'can', 'the', 'or', 'are']),
minsize=2,
maxsize=None,
renumber=True,
lang=None
)
So You can resolve this issue by redefining your schema and removing the StopFilter
or using it with minsize = 1
:
from whoosh.analysis import StemmingAnalyzer
schema = Schema(content=TEXT(analyzer=StemmingAnalyzer(stoplist=None)))
or
schema = Schema(content=TEXT(analyzer=StemmingAnalyzer(minsize=1)))
Upvotes: 2
Reputation: 335
Solved with this parameter in my schema
StemmingAnalyzer(minsize=1)
Upvotes: 0