Reputation: 472
I translated the below SQL query to cypher. group by
in cypher is implicit and it causes confusion and more query execution time. My SQL query is:
INSERT INTO tmp_build
(result_id, hshld_id, product_id)
SELECT b.result_id, a.hshld_id, b.cluster_id
FROM fact a
INNER JOIN productdata b ON a.product_id = b.barcode
WHERE b.result_id = 1
GROUP BY b.result_id, a.hshld_id, b.cluster_id;
The equivalent cypher query is:
MATCH (b:PRODUCTDATA {RESULT_ID: 1 })
WITH b
MATCH (b)<-[:CREATES_PRODUCTDATA]-(a:FACT)
WITH b.RESULT_ID as RESULT_ID , collect(b.RESULT_ID) as result, a.HSHLD_ID as HSHLD_ID,
collect(a.HSHLD_ID) as hshld, b.CLUSTER_ID as CLUSTER_ID, collect(b.CLUSTER_ID) as cluster
CREATE (:TMP_BUILD { RESULT_ID:RESULT_ID , HSHLD_ID:HSHLD_ID , PRODUCT_ID:CLUSTER_ID });
This query is running slow because of collect()
. Without collect function is not giving me the group by results. Is there any way to optimise it? or better implementation of group by in cypher?
Upvotes: 0
Views: 174
Reputation: 30397
In the Cypher query, you are attempting to return rows with both a singular values (RESULT_ID, HSHLD_ID, CLUSTER_ID) and their collections, but since you're returning both, your collections will only have the same value repeated the number of times it occurred in the results (for example, RESULT_ID = 1, result = [1,1,1,1]). I don't think that's useful for you.
Also, nothing in your original query seems to suggest you need aggregations. Your GROUP BY columns are the only columns being returned, there are no aggregation columns, so that seems like you just need distinct rows. Try removing the collection columns from your Cypher query, and use WITH DISTINCT
instead of just WITH
.
If that doesn't work, then I think you will need to further explain exactly what it is that you are attempting to get as the result.
Upvotes: 2