Reputation: 170
I have existing index, adding new and searching documents works fine. However updating and deleting existing documents does not work if deletion term has alphanumeric value (ABC123 or ABC), with numeric values everithing works. I'm using Lucene 8.11.2 and Java8. I'm using StandardAnalyzer. Below is my simplified code
public class MyDirectory {
@Getter
private Directory index;
@Getter
private IndexWriter writer;
public MyDirectory (String indexPath) {
index = FSDirectory.open(Paths.get(indexPath))
}
public void addNewDocument() {
try {
openWriter();
Document doc = new Document();
doc.add(new TextField("ID", "ABC123", Field.Store.YES));
getWriter().addDocument(doc);
closeWriter();
} catch (Exception e) {
}
}
pubic void updateDocument() {
try {
openWriter();
Term delTerm = new Term("ID", "ABC123");
List<Document> docs = new ArrayList<>();
Document doc = new Document();
doc.add(new TextField("ID", "ABC123", Field.Store.YES));
doc.add(new TextField("NAME", "test", Field.Store.YES));
docs.add(doc);
// Adds second document with id ABC123 and name 'test' to Index.
// I'm expecting here that old document with id ABC123 will removed.
// If I have 123 as an ID (only numbers) then it works
getWriter().updateDocuments(delTerm, docs);
closeWriter();
} catch (Exception e) {
}
}
private void openWriter() throws IOException {
writer = new IndexWriter(getIndex(), new IndexWriterConfig(getPerFieldAnalyzer()));
}
private PerFieldAnalyzerWrapper getPerFieldAnalyzer() {
return new PerFieldAnalyzerWrapper(new StandardAnalyzer());
}
private void closeWriter() {
try {
getWriter().close();
} catch (IOException e) {
}
}
}
Do I need to use diferent analyzer for that field?
Upvotes: 1
Views: 56
Reputation: 170
After some investigation, I figured out that Term does not tokenize input text and as a result, deletion was not performed because the ID field was added to the document with TextField and thus tokenized. So, I've changed TextField to StringField which does not perform tokenization, and then update/delete worked as expected. However, in this case, regular search by ID does not work, so I ended up having two ID fields in the index: one tokenized for external search and another one that is not tokenized for internal use.
Also, another solution for an update was to use Query with deleteDocuments() method and then add new documents:
BooleanQuery.Builder querybuilder = new BooleanQuery.Builder();
QueryParser queryParser = new QueryParser("ID", getPerFieldAnalyzer());
querybuilder.add(queryParser.parse("ABC123"),
BooleanClause.Occur.FILTER);
getWriter().deleteDocuments(querybuilder.build());
List<Document> docs = new ArrayList<>();
Document doc = new Document();
doc.add(new TextField("ID", "ABC123", Field.Store.YES));
doc.add(new TextField("NAME", "test", Field.Store.YES));
docs.add(doc);
getWriter().addDocuments(docs);
Upvotes: 1