Btibert3
Btibert3

Reputation: 40176

How would I model an email marketing graph in neo4j

I am really drawn (and new) to neo4j as a way to model my data for easier analysis. Part of my job requires that I analyze our email marketing efforts.

As a simple data model, I think of my graph as having 3 nodes:

  1. Lead - the customer in the database
  2. Email - the email sent to the customer(s)
  3. URL - a link contained in an email

With the relationships being:

  1. (Lead) -[:SENT]-> (Email)
  2. (Lead) -[:OPEN]-> (Email)
  3. (Lead) -[:CLICKED_THRU]-> (Email)
  4. (EMAIL) -[:CONTAINS]-> (URL)

Now to my question. Using the data model I constructed above, how can I isolate the URLs that a Lead has clicked on. If I add another relationship (Lead) -[:CLICKED_ON]-> (URL), I do not know which email the URL was contained in (we send the same URL in multiple emails).

Right now, I have a traditional RDBMS implementation, where I know which lead clicked on which URLs from each email.

I want to try to learn neo4j using this business problem, but I am struggling as to how to relate the URL that was clicked on to the specific email.

Thanks in advance for any help. If this is not the proper forum, please let me know where I can direct my question.

Upvotes: 1

Views: 540

Answers (2)

JohnMark13
JohnMark13

Reputation: 3739

I assume that at the point the URL is accessed you know which email it was from, that an E-Mail contains multiple URLs and that the same URLs may be present in multiple emails. You may want to model a hyper edge (something that links together more than two nodes) as a node:

(Lead)-[:CLICKED_URL_FROM]->(EmailLinkNode)
(EmailLinkNode)-[:FROM_EMAIL]->(email)
(EmailLinkNode)-[:CLICKED_URL]->(url)

I think that this is the only way to relate three Nodes in a single 'relationship', but I am quite new to this myself.

Something similar is described on the NeoTechnology page here.

I guess considering your data you could consider also think of creating a new Node to represent the concept of an EmailUrl which is a dummy node used to uniquely identify a url when related to a specific email.

(email)-[:CONTAINS]->(EmailUrl)-[:FOR_URL]->(url)

This leads to a simple relationship between the lead and the now unique node (lead)-[:CLICKED_THRU]->(EmailUrl) and therefore simple queries to find out not only which urls were clicked, but which emails proved the most enticing to your leads.

Upvotes: 1

FrobberOfBits
FrobberOfBits

Reputation: 18022

The trouble with the model as you have it, is that in order for it to work you must assume that each email has one and only one link. That may not generally be true.

Now, if it is true that every email has only one link, then you could do what you wanted to this way:

MATCH (l:Lead)-[:CLICKED_THRU]->(e:Email)->[:CONTAINS]->(url:URL) return l, url

This would tell you who clicked on which URL. But notice that if there's more than one URL per email, this would make it look like every user who ever clicked on an email link clicked on every link in that email.

A better way to model your data would be like this:

(Lead)-[:CLICKED_THRU]->(URL) (EMAIL)-[:CONTAINS]->(URL) (Lead)-[:OPEN]->(Email)

This would let you ask which URLs were clicked (just by following CLICKED_THRU but it would also tell you which emails were opened. Also, if URLs are unique to emails, by following the connection :CONTAINS you could know which email was opened by which link was clicked.

Finally, for general modeling concerns in neo4j, make sure to check out this presentation which goes into depth on how to think about modeling, and how it's different than relational.

Upvotes: 1

Related Questions