Jonathan Nicholson
Jonathan Nicholson

Reputation: 11

Neo4j python interface py2neo & data typing

I've got a curious issue with neo4j and python, it appears to be related to typing of numbers being put in by the python interface py2neo

If I create a simple database using the cypher commands:-

create (n:Type {name:"foo1"});
create (n:Type {name:"foo2"});
match (n:Type {name:"foo1"}), (n2:Type {name:"foo2"})
 create (n)-[r:NUMBER {name: "flow1", value: 1000000000}]->(n2),
  (n)-[:NUMBER {name: "flow2", value: 1000000000}]->(n2),
  (n)-[:NUMBER {name: "flow3", value: 1000000000}]->(n2),
  (n)-[:NUMBER {name: "flow4", value: 1000000000}]->(n2),
  (n)-[:NUMBER {name: "flow5", value: 1000000000}]->(n2),
  (n)-[:NUMBER {name: "flow6", value: 1000000000}]->(n2),
  (n)-[:NUMBER {name: "flow7", value: 1000000000}]->(n2),
  (n)-[:NUMBER {name: "flow8", value: 1000000000}]->(n2),
  (n)-[:NUMBER {name: "flow9", value: 1000000000}]->(n2);

and run an aggregate query to sum the values of the relationships:-

match (n)-[r]->(n2) return n,sum(r.value),n2;

I get the expected result:-

+--------------------------------------------------------------------+
| n                        | sum(r.value) | n2                       |
+--------------------------------------------------------------------+
| Node[20103]{name:"foo1"} | 9000000000   | Node[20104]{name:"foo2"} |
+--------------------------------------------------------------------+

However if I populate the same dataset using this python script:-

#!/usr/bin/python

from py2neo import Graph, Path, Node, authenticate, Relationship

authenticate("localhost:7474", "neo4j", "password")

graph = Graph()


foo1 = Node('Type', name='foo1')
foo2 = Node('Type', name='foo2')

graph.create(foo1)
graph.create(foo2)

for i in range(1,10):
 r = Relationship.cast(foo1, 'NUMBER', foo2, { 'name': 'foo%d' % i,    'value': 1000000000 } )
 graph.create_unique(r)

And then run the same query I get the slightly surprising result:-

neo4j-sh (?)$  match (n)-[r]->(n2) return n,sum(r.value),n2;
+--------------------------------------------------------------------+
| n                        | sum(r.value) | n2                       |
+--------------------------------------------------------------------+
| Node[20105]{name:"foo1"} | 410065408    | Node[20106]{name:"foo2"} |
+--------------------------------------------------------------------+

Which is consistent with sum() being constrained to 32bit.

If any value is >32bit the sum is correct, but if all would fit within 32 bits the sum returns the wrong answer.

Any help appreciated.

This is python 2.7.6 with neo4j 2.3.1 on ubuntu 14.04lts

Upvotes: 1

Views: 336

Answers (1)

Martin Preusse
Martin Preusse

Reputation: 9369

This works on your data added with py2neo:

match (n)-[r]->(n2) return n,sum(toInt(r.value)),n2;

And if you cast the input to float in your Python code the original Cypher query works (without toInt()):

...
r = Relationship.cast(foo1, 'NUMBER', foo2, { 'name': 'foo%d' % i, 'value': float(1000000000) } )
...

I would assume that running the Cypher query from the console and adding nodes from neo4j create different data types?

This question explains what neo4j uses internally, integers are stored as JAVA long: Cypher creates number as a long. How do I create an integer?

If any value is >32bit the sum is correct, but if all would fit within 32 bits the sum returns the wrong answer.

I don't know much about JAVA data types, but shouldn't a long always be 64 bit? Or maybe the 1000000000 is stored as 32 bit and then the 'sum()' breaks if all values are 32 bit?

Upvotes: 1

Related Questions