Reputation: 55
I have a problem with a Cypher query on my Neo4j instance.
I have the following graph Structure:
(d:Document)-->(t:Token)-->(l:Lemma)
A Document can have outgoing relationships to many Tokens, whereas a Token has always exactly one incoming relationship from a Document. A Token always has exactly one outgoing relationship to a Lemma, whereas a Lemma can have multiple incoming relationships from Tokens.
So the cardinalities are [Document]-n-1-[Token]-1-m-[Lemma]
.
I want, for each Document in a given list documentIds
, count the number of distinct Tokens
and Lemmata in this pattern and devide the latter by the former. This should factor in that each Lemma can be connected
to multiple Tokens in the pattern and these Lemmata should not be counted multiple
times.
My query so far looks like this:
MATCH (d:DOCUMENT)--(t:TOKEN)--(l:LEMMA)
WHERE d.id in {documentIds}
WITH d, count(DISTINCT l)/count(DISTINCT t) AS ttr
RETURN d.id AS id, ttr
I have the feeling that this counts the Lemmata and Tokens across documents,
instead of counting for each document separately.
Also in my result ttr
is 0.0
for each d.id
.
I don't know if there is a way for me to provide you my database content. Is there some obvious mistake in the query?
EDIT:
I create a console.
http://console.neo4j.org/r/yqtrbx
In this case there are two Documents whose Tokens share one Lemma in common. For this graph I want the result to be 2/3 for the document with id 10023 and 2/2 for the document with id 10050. In a full document the difference between the Token count and the Lemma count is usually much higher.
Upvotes: 3
Views: 736
Reputation: 16375
You are facing with a issue related to the fact you are dividing two integer numbers and getting an integer as result. This way the division 2/3 will result in zero and not the expected 0.66. To fix this issue simply cast one of the integers to float, this way:
match (d:DOCUMENT)-->(t:TOKEN)-->(l:LEMMA)
with d, count(distinct l) as cl, count(distinct t) as ct
return d, cl, ct, cl / toFloat(ct)
The result will be (based on your data set):
╒════════════╤════╤════╤══════════════════╕
│"d" │"cl"│"ct"│"cl / toFloat(ct)"│
╞════════════╪════╪════╪══════════════════╡
│{"id":10050}│2 │2 │1 │
├────────────┼────┼────┼──────────────────┤
│{"id":10023}│2 │3 │0.6666666666666666│
└────────────┴────┴────┴──────────────────┘
Upvotes: 3