J J Doe
J J Doe

Reputation: 35

Neo4j query for complete paths

I have the following structure.

CREATE 
(`0` :Sentence {`{text`:'This is a sentence'}}) ,
(`1` :Word {`{ text`:'This' }}) ,
(`2` :Word {`{text`:'is'}}) ,
(`3` :Sentence {`{'text'`:'Sam is a dog'}}) ,
(`0`)-[:`RELATED_TO`]->(`1`),
(`0`)-[:`RELATED_TO`]->(`2`),
(`3`)-[:`RELATED_TO`]->(`2`)

schema example

So my question is this. I have a bunch of sentences that I have decomposed into word objects. These word objects are all unique and therefore will point to different sentences. If I perform a search for one word it's very easy to figure out all of the sentences that word is related to. How can I structure a query to figure out the same information for two words instead of one.

I would like to submit two or more words and find a path that includes all words submitted picking up all sentences of interest.

Upvotes: 1

Views: 98

Answers (3)

cybersam
cybersam

Reputation: 67044

As an alternative, you could consider using manual indexing (also called "legacy indexing") instead of using Word nodes and RELATED_TO relationships. Manual indexes support "fulltext" searches using lucene.

There are many apoc procedures that help you with this.

Here is an example that might work for you. In this example, I assume case-insensitive comparisons are OK, you retain the Sentence nodes (and their text properties), and you want to automatically add the text properties of all Sentence nodes to a manual index.

  1. If you are using neo4j 3.2+, you have to add this setting to the neo4j.conf file to make some expensive apoc.index procedures (like apoc.index.addAllNodes) available:

    dbms.security.procedures.unrestricted=apoc.*
    
  2. Execute this Cypher code to initialize a manual index named "WordIndex" with the text text from all existing Sentence nodes, and to enable automatic indexing from that point onwards:

    CALL apoc.index.addAllNodes('WordIndex', {Sentence: ['text']}, {autoUpdate: true})
    YIELD label, property, nodeCount
    RETURN *;
    
  3. To find (case insensitively) the Sentence nodes containing all the words in a collection (passed via a $words parameter), you'd execute a Cypher statement like the one below. The WITH clause builds the lucene query string (e.g., "foo AND bar") for you. Caveat: since lucene's special boolean terms (like "AND" and "OR") are always in uppercase, you should make sure the words you pass in are lowercased (or modify the WITH clause below to use the TOLOWER()` function as needed).

    WITH REDUCE(s = $words[0], x IN $words[1..] | s + ' AND ' + x) AS q
    CALL apoc.index.search('WordIndex', q) YIELD node
    RETURN node;
    

Upvotes: 0

InverseFalcon
InverseFalcon

Reputation: 30417

I just remembered an alternate approach that may work better. Compare the PROFILE on this query with the profiles for the others, see if it works better for you.

WITH {myListOfWords} as wordList
WITH wordList, size(wordList) as wordCnt
MATCH (s)-[:RELATED_TO]->(w:Word) 
WHERE w.text in wordList
WITH s, wordCnt, count(DISTINCT w) as cnt
WHERE wordCnt = cnt
RETURN s

Upvotes: 1

InverseFalcon
InverseFalcon

Reputation: 30417

Unfortunately it's not a very pretty approach, it basically comes down to collecting :Word nodes and using the ALL() predicate to ensure that the pattern you want holds true for all elements of the collection.

MATCH (w:Word) 
WHERE w.text in {myListOfWords}
WITH collect(w) as words
MATCH (s:Sentence)
WHERE ALL(word in words WHERE (s)-[:RELATED_TO]->(word))
RETURN s

What makes this ugly is that the planner isn't intelligent enough right now to infer that when you say MATCH (s:Sentence) WHERE ALL(word in words ... that the initial matches for s ought to come from the match from the first w in your words collection, so it starts out from all :Sentence nodes first, which is a major performance hit.

So to get around this, we have to explicitly match from the first of the words collection, and then use WHERE ALL() for the remaining.

MATCH (w:Word) 
WHERE w.text in {myListOfWords}
WITH w, size(()-[:RELATED_TO]->(w)) as rels
WITH w ORDER BY rels ASC
WITH collect(w) as words
WITH head(words) as head, tail(words) as words
MATCH (s)-[:RELATED_TO]->(head)
WHERE ALL(word in words WHERE (s)-[:RELATED_TO]->(word))
RETURN s

EDIT:

Added an optimization to order your w nodes by the degree of their incoming :RELATED_TO relationships (this is a degree lookup on very few nodes), as this will mean the initial match to your :Sentence nodes is the smallest possible starting set before you filter for relationships from the rest of the words.

Upvotes: 0

Related Questions