Reputation: 940
I currently have some code that looks through various datasets and models electronic relationships between them. E.g., JSESSIONID.
I would like to model each user's interactions with an application where they have to submit unique identifiers. E.g., email address.
In processing logs of the application, I see [email protected] use the application with JSESSIONID asdfghjkl. I then see [email protected] also use the applcation with JESSIONID asdfghjkl. Finally, I see [email protected] use JSESSIONID qwertyuiop.
In my go code, it's easy for me to process the logs and write out both [email protected] and [email protected] as Nodes and then write the JSESSIONID relationship between them.
MERGE (a:EMAIL {label:[email protected]}) MERGE (b:EMAIL {label:[email protected]}) MERGE (a)-[:asdfghjkl]-(b)
However, I don't know the best way to do this at scale. (i.e., Application logs are 1TB in size). The limitation is memory -- I can't find all email addresses that use asdfghjkl as a SessionIDs without processing all the data, so I can't write out the relationship between them due to memory constraints.
What I would really like to do is to write out something as is follows, but this obviously fails:
MERGE (a:EMAIL {label:[email protected]}) (a)-[:asdfghjkl]
Then later: MERGE (b:EMAIL {label:[email protected]}) (b)-[:asdfghjkl]
Can I create these relationships with a query after the fact?
Upvotes: 0
Views: 51
Reputation: 30397
Sounds like you should model JSESSIONID as nodes rather than as relationships, as that will allow you to link the JSESSIONID to multiple email addresses, and you can add a unique constraint on the id for fast lookups.
MERGE (a:EMAIL {label:[email protected]})
MERGE (b:EMAIL {label:[email protected]})
MERGE (jsid:JSESSIONID {id:'asdfghjkl'})
MERGE (a)-[:jsid]->(jsid)
MERGE (b)-[:jsid]->(jsid)
Your queries to find all :EMAIL nodes using a specific JSESSION id should be quite fast:
MATCH (email:EMAIL)-[:jsid]->(jsid:JSESSIONID {id:'asdfghjkl'})
RETURN email
Upvotes: 1