Teodor Mysko
Teodor Mysko

Reputation: 170

Lucene: documents are not removed from index if term contains alphanumeric value

I have existing index, adding new and searching documents works fine. However updating and deleting existing documents does not work if deletion term has alphanumeric value (ABC123 or ABC), with numeric values everithing works. I'm using Lucene 8.11.2 and Java8. I'm using StandardAnalyzer. Below is my simplified code

public class MyDirectory {
  
  @Getter
  private Directory index;
  @Getter
  private IndexWriter writer;

  public MyDirectory (String indexPath) {
    index = FSDirectory.open(Paths.get(indexPath))
  }

  public void addNewDocument() {
    try {
      openWriter();

      Document doc = new Document();
      doc.add(new TextField("ID", "ABC123", Field.Store.YES));
      getWriter().addDocument(doc);

      closeWriter();
    } catch (Exception e) {
    } 
  }

  pubic void updateDocument() {
    try {
      openWriter();

      Term delTerm = new Term("ID", "ABC123");
      List<Document> docs = new ArrayList<>();
      Document doc = new Document();
      doc.add(new TextField("ID", "ABC123", Field.Store.YES));
      doc.add(new TextField("NAME", "test", Field.Store.YES));
      docs.add(doc);

      // Adds second document with id ABC123 and name 'test' to Index. 
      // I'm expecting here that old document with id ABC123 will removed.
      // If I have 123 as an ID (only numbers) then it works
      getWriter().updateDocuments(delTerm, docs);
      closeWriter();
    } catch (Exception e) {
    }
  }

  private void openWriter() throws IOException {
    writer = new IndexWriter(getIndex(), new IndexWriterConfig(getPerFieldAnalyzer()));
  }


  private PerFieldAnalyzerWrapper getPerFieldAnalyzer() {
    return new PerFieldAnalyzerWrapper(new StandardAnalyzer());
  }

  private void closeWriter() {
    try {
      getWriter().close();

    } catch (IOException e) {
    }
  }
} 

Do I need to use diferent analyzer for that field?

Upvotes: 1

Views: 56

Answers (1)

Teodor Mysko
Teodor Mysko

Reputation: 170

After some investigation, I figured out that Term does not tokenize input text and as a result, deletion was not performed because the ID field was added to the document with TextField and thus tokenized. So, I've changed TextField to StringField which does not perform tokenization, and then update/delete worked as expected. However, in this case, regular search by ID does not work, so I ended up having two ID fields in the index: one tokenized for external search and another one that is not tokenized for internal use.

Also, another solution for an update was to use Query with deleteDocuments() method and then add new documents:

BooleanQuery.Builder querybuilder = new BooleanQuery.Builder();
QueryParser queryParser = new QueryParser("ID", getPerFieldAnalyzer());
querybuilder.add(queryParser.parse("ABC123"), 
BooleanClause.Occur.FILTER);
getWriter().deleteDocuments(querybuilder.build());

List<Document> docs = new ArrayList<>();
Document doc = new Document();
doc.add(new TextField("ID", "ABC123", Field.Store.YES));
doc.add(new TextField("NAME", "test", Field.Store.YES));
docs.add(doc);
getWriter().addDocuments(docs);

Upvotes: 1

Related Questions