Reputation: 1784
I am evaluating the AppEngine Document Indexes Fulltext Search, and run into some problems while using the Stem Operator '~'. Basically I created an index of a few test documents, all with a title field. Some of the example values of the field are:
"Houses Desks Tables"
"referer image vod event"
"events with cats and dogs and"
"names very interesting days"
I'm using Java, and a snippet of my query code looks like below:
Document doc = Document.newBuilder().setId(key)
.addField(Field.newBuilder().setName("title").setText(title))
.addField(Field.newBuilder().setName("type").setText(type))
.addField(Field.newBuilder().setName("username").setText(username))
.build();
DocumentSearchIndexService.getInstance().indexDocument(indexName, doc);
IndexSpec indexSpec = IndexSpec.newBuilder().setName(indexName).build();
Index index = SearchServiceFactory.getSearchService().getIndex(indexSpec);
return index.search("title = ~"+searchText);
However, the returned result will always only matching the exact singular or plural form:
query cat, return nothing
query dog, return nothing
query name, return nothing
query house, return nothing
query cats, return "events with cats and dogs and"
query dogs, return "events with cats and dogs and"
query names, return "names very interesting days"
query houses, return "Houses Desks Tables"
So I am really lost as in how the entries are returned, or if the way my query constructed is not correct.
Upvotes: 0
Views: 59
Reputation: 3764
Notice that stemming is not implemented if you are using the Java Development Server for Java 8 on the Standard Environment.
If you are deploying your application on App Engine use the Utils.java class found here to properly index your document.
I cloned the repository for the java-docs-samples
for Google Cloud Platform, went to the appengine-java8/search
folder and modified the code for the SearchServlet.java class in the following way in order to include queries with the stem operator "~":
...
@Override
public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
PrintWriter out = resp.getWriter();
Document doc =
Document.newBuilder()
.setId("theOnlyPiano")
.addField(Field.newBuilder().setName("product").setText("cats and dogs"))
.addField(Field.newBuilder().setName("maker").setText("Yamaha"))
.addField(Field.newBuilder().setName("price").setNumber(4000))
.build();
try {
Utils.indexADocument(SEARCH_INDEX, doc);
} catch (InterruptedException e) {
// ignore
}
// [START search_document]
final int maxRetry = 3;
int attempts = 0;
int delay = 2;
while (true) {
try {
String searchText = "cat";
String queryString = "product = ~"+searchText;
Results<ScoredDocument> results = getIndex().search(queryString);
// Iterate over the documents in the results
for (ScoredDocument document : results) {
// handle results
out.print("product: " + document.getOnlyField("product").getText());
//out.println(", price: " + document.getOnlyField("price").getNumber());
}
} catch (SearchException e) {
if (StatusCode.TRANSIENT_ERROR.equals(e.getOperationResult().getCode())
&& ++attempts < maxRetry) {
// retry
try {
Thread.sleep(delay * 1000);
} catch (InterruptedException e1) {
// ignore
}
delay *= 2; // easy exponential backoff
continue;
} else {
throw e;
}
}
break;
}
// [END search_document]
// We don't test the search result below, but we're fine if it runs without errors.
out.println(" Search performed");
Index index = getIndex();
// [START simple_search_1]
index.search("rose water");
// [END simple_search_1]
// [START simple_search_2]
index.search("1776-07-04");
// [END simple_search_2]
// [START simple_search_3]
// search for documents with pianos that cost less than $5000
index.search("product = ~cat AND price < 5000");
// [END simple_search_3]
}
}
and I was able to verify that the stem operator works "~" correctly for plurals (with words like cats, dogs, etc.). But notice that as mentioned on the documentation the stemming algorithm has its limitations.
Note. If you want to replicate the steps I made don't forget to comment the testing section on the SearchServletTest.java class prior deploying the application to App Engine with mvn appengine:deploy
. The file should look like this:
...
@After
public void tearDown() {
helper.tearDown();
}
@Test
public void doGet_successfulyInvoked() throws Exception {
// servletUnderTest.doGet(mockRequest, mockResponse);
// String content = responseWriter.toString();
// assertWithMessage("SearchServlet response").that(content).contains("maker: Yamaha");
// assertWithMessage("SearchServlet response").that(content).contains("price: 4000.0");
}
}
Upvotes: 1