mzimmerman
mzimmerman

Reputation: 940

Modeling relationships in neo4j when they aren't known initially

I currently have some code that looks through various datasets and models electronic relationships between them. E.g., JSESSIONID.

I would like to model each user's interactions with an application where they have to submit unique identifiers. E.g., email address.

In processing logs of the application, I see [email protected] use the application with JSESSIONID asdfghjkl. I then see [email protected] also use the applcation with JESSIONID asdfghjkl. Finally, I see [email protected] use JSESSIONID qwertyuiop.

In my go code, it's easy for me to process the logs and write out both [email protected] and [email protected] as Nodes and then write the JSESSIONID relationship between them.

MERGE (a:EMAIL {label:[email protected]}) MERGE (b:EMAIL {label:[email protected]}) MERGE (a)-[:asdfghjkl]-(b)

However, I don't know the best way to do this at scale. (i.e., Application logs are 1TB in size). The limitation is memory -- I can't find all email addresses that use asdfghjkl as a SessionIDs without processing all the data, so I can't write out the relationship between them due to memory constraints.

What I would really like to do is to write out something as is follows, but this obviously fails:

MERGE (a:EMAIL {label:[email protected]}) (a)-[:asdfghjkl]

Then later: MERGE (b:EMAIL {label:[email protected]}) (b)-[:asdfghjkl]

Can I create these relationships with a query after the fact?

Upvotes: 0

Views: 51

Answers (1)

InverseFalcon
InverseFalcon

Reputation: 30397

Sounds like you should model JSESSIONID as nodes rather than as relationships, as that will allow you to link the JSESSIONID to multiple email addresses, and you can add a unique constraint on the id for fast lookups.

MERGE (a:EMAIL {label:[email protected]}) 
MERGE (b:EMAIL {label:[email protected]}) 
MERGE (jsid:JSESSIONID {id:'asdfghjkl'})
MERGE (a)-[:jsid]->(jsid)
MERGE (b)-[:jsid]->(jsid)

Your queries to find all :EMAIL nodes using a specific JSESSION id should be quite fast:

MATCH (email:EMAIL)-[:jsid]->(jsid:JSESSIONID {id:'asdfghjkl'})
RETURN email

Upvotes: 1

Related Questions