Reputation: 1096
I am trying to build a little file and email search engine. I'd like also to use more advanced search queries for the full text search. Hence I am looking at lucene indexes. From what I have seen, there are two approaches - node_auto_index and apoc.index.addNode.
Setting the index up works fine, and indexing nodes with small properties works. When trying to index nodes with properties that are larger then 32k, neo4j fails (and get's into an unusable state).
The error message boils down to:
WARNING: Failed to invoke procedure
apoc.index.addNode
: Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="text_e" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101]...', original message: bytes can be at most 32766 in length; got 40000
I have checked this on 3.1.2 and 3.1.0+ apoc 3.1.0.3
A much longer description of the problem can be found at https://baach.de/Members/jhb/neo4j-full-text-indexing.
Is there any way to fix this? E.g. have I done anything wrong, or is there something to configure?
Thx a lot!
Upvotes: 1
Views: 738
Reputation: 46
neo4j does not support index values that are longer then ~32k because of underlying lucene limitation. For some details around that area You can look at: https://github.com/neo4j/neo4j/pull/6213 and https://github.com/neo4j/neo4j/pull/8404. You need to split such longer values into multiple terms.
Upvotes: 3