Reputation: 13
I set up the fulltext autoindex in my Neo4j db.
To be clear, in this case, Neo4j is always using Lucene to do the index. I have around 20 million nodes at the moment and that might soon increase to over 40 million nodes.
For most of the queries performance is fine, almost instant, but sometimes queries like
"*term*"
take up to 20 seconds to return.
Could you share some tips about optimizing Neo4j and Lucene to perform faster full text searches? Maybe I should modify some caching properties?
The basic config is well explained in the docs but any sort of well written guides on how to configure and/or alter Lucene's behaviour inside Neo4j are hard to find.
Upvotes: 1
Views: 275
Reputation: 18002
I think your main problem is that you're using a leading wildcard there. See other answers about lucene performance in general on leading wildcards.
If you're only looking for a simple term, you may want to extract terms from documents and link them to separate nodes by that term, so that you can exploit graph connections to get to documents containing a term.
No matter how you build your index, this query "term" has to look at just about every possible substring of every string in your database, which is going to take a long time.
You may want to look into tokenizing your documents and extracting those key terms, so that you in the end have something like this:
(d:Document)-[:contains]->(t:Term { term: "foo" });
Then, when you want to know which documents have "foo" in them, you don't do Lucene anymore, but this:
MATCH (t:Term {term: "foo"})<-[:contains]-(d:Document)
RETURN d;
I expect this would be much, much, much faster but would require you to do that term extraction on the front end. It will also work mostly for simple terms, and not queries like [foo?o?o?bar]
Upvotes: 2