Aman Kumar
Aman Kumar

Reputation: 137

Neo4J Optimsation For Creating Relationships Between Nodes

I am working on around 50000 tweets as node having similar data as shown below.

{ "date": "2017-05-26T09:50:44.000Z", "author_name": "djgoodlook", "share_count": 0, "mention_name": "firstpost", "tweet_id": "868041705257402368", "mention_id": "256495314", "location": "pune india", "retweet_id": "868039862774931456", "type": "Retweet", "author_id": "103535663", "hashtag": "KamalHaasan" }

I have tried to make relationships between tweets having same location by using following command MATCH (a:TweetData),(b:TweetData) WHERE a.location = b.location AND NOT a.tweet_id = b.tweet_id CREATE (a)-[r:SameLocation]->(b) RETURN r

And using this command I didn't able to make relationship as it is took more than 20 hour and still didn't produced the results. While for hashtag relationship it worked fine with similar command as it took around 5 minutes. Is their any other method to make relationship or any way to optimise this query.

Upvotes: 0

Views: 50

Answers (1)

InverseFalcon
InverseFalcon

Reputation: 30397

Yes. First, make sure you have an index on :TweetData(location), that's the most important change, since without that every single node lookup will have to scan all 50k :TweetData nodes for a common location (that's 50k ^2 lookups).

Next, it's better to ensure one node's id is less than the other, otherwise you'll get the same pairs of nodes twice, with just the order reversed, resulting in two relationships for every pair, one in each direction, instead of just the single relationship you want.

Lastly, do you really need to return all relationships? That may kill your browser, maybe return just the count of relationships added.

MATCH (a:TweetData)
MATCH (b:TweetData)
WHERE a.location = b.location AND a.tweet_id < b.tweet_id
CREATE (a)-[r:SameLocation]->(b)
RETURN count(r)

One other thing to (strongly) consider is instead of tracking common locations this way, create a :Location node instead, and link all :TweetData nodes to it.

You will need an index or unique constraint on :Location(name), then:

MATCH (a:TweetData)
MERGE (l:Location {name:a.location})
CREATE (a)-[:LOCATION]->(l) 

This approach also more easily lends itself to batching, if 50k nodes at once is too much. You can just use LIMIT and SKIP after your match to a.

Upvotes: 1

Related Questions