Reputation: 1108
I created a custom token filter which concatenates all the tokens in the stream. This is my incrementToken()
function
public boolean incrementToken() throws IOException {
if (finished) {
logger.debug("Finished");
return false;
}
logger.debug("Starting");
StringBuilder buffer = new StringBuilder();
int length = 0;
while (input.incrementToken()) {
if (0 == length) {
buffer.append(termAtt);
length += termAtt.length();
} else {
buffer.append(" ").append(termAtt);
length += termAtt.length() + 1;
}
}
termAtt.setEmpty().append(buffer);
//offsetAtt.setOffset(0, length);
finished = true;
return true;
}
I added the new Filter to the end of index and query analysis chain for a field and testing the filter from http://localhost:8983/solr/admin/analysis.jsp seems to be working. The filter is concatenating the tokens in the stream. But on re-indexing the documents only my first document is getting indexed.
This is how my filter chain looks like.
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[-_]" replacement=" " />
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StopWordFilterFactory" ignoreCase="true" words="words.txt" />
<filter class="org.custom.solr.analysis.ConcatFilterFactory" />
</analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[-_]" replacement=" " />
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StopWordFilterFactory" ignoreCase="true" words="words.txt" />
<filter class="org.custom.solr.analysis.ConcatFilterFactory" />
</analyzer>
Without the ConcatFilterFactory
all words are getting indexed properly but with ConcatFilterFactory
only the first document is getting indexed. What am I doing wrong? Kindly help me in understanding the problem.
UPDATE :
Finally figured out the issue.
if (finished) {
logger.debug("Finished");
finished = false;
return false;
}
Looks like the same class is being reused. Makes sense.
Upvotes: 5
Views: 500
Reputation: 556
You should write a unit test for your filter. It should fail even if your Analysis works. Apparently you forgot to add this line before returning false:
finished = false;
Upvotes: 0