Reputation: 35
I have the following structure.
CREATE
(`0` :Sentence {`{text`:'This is a sentence'}}) ,
(`1` :Word {`{ text`:'This' }}) ,
(`2` :Word {`{text`:'is'}}) ,
(`3` :Sentence {`{'text'`:'Sam is a dog'}}) ,
(`0`)-[:`RELATED_TO`]->(`1`),
(`0`)-[:`RELATED_TO`]->(`2`),
(`3`)-[:`RELATED_TO`]->(`2`)
So my question is this. I have a bunch of sentences that I have decomposed into word objects. These word objects are all unique and therefore will point to different sentences. If I perform a search for one word it's very easy to figure out all of the sentences that word is related to. How can I structure a query to figure out the same information for two words instead of one.
I would like to submit two or more words and find a path that includes all words submitted picking up all sentences of interest.
Upvotes: 1
Views: 98
Reputation: 67044
As an alternative, you could consider using manual indexing (also called "legacy indexing") instead of using Word
nodes and RELATED_TO
relationships. Manual indexes support "fulltext" searches using lucene.
There are many apoc procedures that help you with this.
Here is an example that might work for you. In this example, I assume case-insensitive comparisons are OK, you retain the Sentence
nodes (and their text
properties), and you want to automatically add the text
properties of all Sentence
nodes to a manual index.
If you are using neo4j 3.2+, you have to add this setting to the neo4j.conf
file to make some expensive apoc.index
procedures (like apoc.index.addAllNodes
) available:
dbms.security.procedures.unrestricted=apoc.*
Execute this Cypher code to initialize a manual index named "WordIndex" with the text
text from all existing Sentence
nodes, and to enable automatic indexing from that point onwards:
CALL apoc.index.addAllNodes('WordIndex', {Sentence: ['text']}, {autoUpdate: true})
YIELD label, property, nodeCount
RETURN *;
To find (case insensitively) the Sentence
nodes containing all the words in a collection (passed via a $words
parameter), you'd execute a Cypher statement like the one below. The WITH
clause builds the lucene
query string (e.g., "foo AND bar") for you. Caveat: since lucene's special boolean terms (like "AND" and "OR") are always in uppercase, you should make sure the words you pass in are lowercased (or modify the WITH
clause below to use the TOLOWER()` function as needed).
WITH REDUCE(s = $words[0], x IN $words[1..] | s + ' AND ' + x) AS q
CALL apoc.index.search('WordIndex', q) YIELD node
RETURN node;
Upvotes: 0
Reputation: 30417
I just remembered an alternate approach that may work better. Compare the PROFILE on this query with the profiles for the others, see if it works better for you.
WITH {myListOfWords} as wordList
WITH wordList, size(wordList) as wordCnt
MATCH (s)-[:RELATED_TO]->(w:Word)
WHERE w.text in wordList
WITH s, wordCnt, count(DISTINCT w) as cnt
WHERE wordCnt = cnt
RETURN s
Upvotes: 1
Reputation: 30417
Unfortunately it's not a very pretty approach, it basically comes down to collecting :Word nodes and using the ALL() predicate to ensure that the pattern you want holds true for all elements of the collection.
MATCH (w:Word)
WHERE w.text in {myListOfWords}
WITH collect(w) as words
MATCH (s:Sentence)
WHERE ALL(word in words WHERE (s)-[:RELATED_TO]->(word))
RETURN s
What makes this ugly is that the planner isn't intelligent enough right now to infer that when you say MATCH (s:Sentence) WHERE ALL(word in words ...
that the initial matches for s
ought to come from the match from the first w
in your words
collection, so it starts out from all :Sentence nodes first, which is a major performance hit.
So to get around this, we have to explicitly match from the first of the words
collection, and then use WHERE ALL()
for the remaining.
MATCH (w:Word)
WHERE w.text in {myListOfWords}
WITH w, size(()-[:RELATED_TO]->(w)) as rels
WITH w ORDER BY rels ASC
WITH collect(w) as words
WITH head(words) as head, tail(words) as words
MATCH (s)-[:RELATED_TO]->(head)
WHERE ALL(word in words WHERE (s)-[:RELATED_TO]->(word))
RETURN s
EDIT:
Added an optimization to order your w
nodes by the degree of their incoming :RELATED_TO relationships (this is a degree lookup on very few nodes), as this will mean the initial match to your :Sentence nodes is the smallest possible starting set before you filter for relationships from the rest of the words.
Upvotes: 0