Reputation: 53
I want to find a phrase in a document, I've used the codes in the quick start.
>>> from whoosh.index import create_in
>>> from whoosh.fields import *
>>> schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
>>> ix = create_in("indexdir", schema)
>>> writer = ix.writer()
>>> writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!")
>>> writer.add_document(title=u"Second document", path=u"/b", content=u"The second one is even more interesting!")
>>> writer.commit()
>>> from whoosh.qparser import QueryParser
>>> with ix.searcher() as searcher:
query = QueryParser("content", ix.schema).parse("first")
results = searcher.search(query)
results[0]
result: {"title": u"First document", "path": u"/a"}
But then I find they will split the keywords into several single word and then search the document. If I want to search a phrase like "the first guy here in the document", what should I do.
On the document ,it said, use
"it is a phrase"
if I want to search for:
it is a phrase.
That confuses me.
Besides, here is a class ,which seems can help me , but I don't know how to use it.
class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)
Matches documents containing a given phrase.
Update: I use it in this way, but there is no matches.
from whoosh.index import create_in
from whoosh.fields import *
schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
ix = create_in("indexdir", schema)
writer = ix.writer()
writer.add_document(title=u"First document", path=u"/a",
content=u"This is the first document we've added!")
writer.add_document(title=u"Second document", path=u"/b",
content=u"The second one is even more interesting!")
writer.commit()
from whoosh.query import Phrase
a = Phrase("content", u"the first")
results = ix.searcher().search(a)
print results
result:
Top 0 Results for Phrase('content', u'the first', slop=1, boost=1.000000) runtime=0.0>
Update according to theOther
with ix.searcher() as searcher:
query = QueryParser("content", ix.schema).parse(**'"first x document"'**)
results = searcher.search(query)
print results[0]
result : Hit {'content': u"This is the first document we've added!", 'path': u'/a', 'title': u'First document'}>
I think there should be no matched result ,as there is no "first x document" in the document. Otherwise, it is not an exact match.
Upvotes: 5
Views: 4302
Reputation: 1805
To find a phrase in a content, use phrase=True
when defining Schema as follows
schema = Schema(title=TEXT(stored=True), content=TEXT(phrase=True))
Then simply use double quotes within single ones as follows
query = QueryParser("content", schema=ix.schema).parse('"exact phrase"')
Upvotes: 1
Reputation: 12097
You should give Phrase
a list
of words not a string as second argument, and also eliminate the because it is a stop word:
a = Phrase("content", [u"first",u"document"])
instead of
a = Phrase("content", u"the first")
Read in documentation:
class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None) Matches documents containing a given phrase.
Parameters:
fieldname – the field to search.
words – a list of words (unicode strings) in the phrase.
The natural use of phrase search in whoosh is by using Quotes " "
in QueryParser
:
>>> with ix.searcher() as searcher:
query = QueryParser("content", ix.schema).parse('"first document"')
results = searcher.search(query)
results[0]
Update: for "first x document"
what it matches, it is because x
and all one-character words are stop-words and are filtered.
Upvotes: 5