Reputation: 2939
I started playing around with RavenDB a few days ago. I like it this far, but I am pretty new to the whole NoSQL world. I am trying to think of patterns when to prefer it (or any other DocumentDB or any other NoSQL-kind of data store) over traditional RDBMSs. I do understand that "when you need to store documents or unstructured/dynamically structured data opt for DocumentDB" but that just feels way too general to grasp.
Why? Because from what I've read, people had been writing examples for "documents" such as order details in an e-commerce application and form details of a workflow management application. But these has been developed with RDBMSs for ages without too much trouble - for example, the details of an order, such as quantity, total price, discount, etc. are perfectly structured.
So I think there's an overlap here. But now, I am not asking for general advices for when to use what, because I believe the best for me would be to figure it out by myself through experimenting; so I am just going to ask about a concrete case along with my concerns.
So let's say I develop an instant messenger application which stores messages to ages back, like Facebook's messaging system does. I think using an RDBMS here is not suitable. My reason to this is that most poeple use instant messaging systems like this:
The thing to note is that most messages are very short, so storing each in a single row with this structure:
Messages(fromUserId, toUserId, sent, content)
feels very ineffective, because the "actual useful information (content)" is very small, whereas the table would contain incredible amounts of rows and therefore the indexes would grow huge. Adding to this the fact that messages are sent very frequently, the size of indexes would have a huge impact on performance. So a very large amount of rows must be managed and stored while every row contains a minimal amount of actual information.
In RavenDB, I would use a structure such as this:
// a Conversation object
{
"FirstUserId": "users/19395",
"SecondUserId": "users/19396",
"Messages": [
{
"Order": 0,
"Sender": "Second",
"Sent": "2016-04-02T19:27:35.8140061",
"Content": "lijhuttj t bdjiqzu "
},
{
"Order": 1,
"Sender": "Second",
"Sent": "2016-04-02T19:27:35.8200960",
"Content": "pekuon eul co"
}
]
}
With this structure, I only need to find out which conversation I am looking for: the one between User A and User B. Any message between User A and User B is stored in this object, regardless of whether User A or User B was the sender. So once I find the conversation between them - and there are far less converations than actual messages - I can just grab all of the messages associated with it.
However, if the two participants talk a lot (and assuming that messages are stored for, let's say, 3 years) there can be tens of thousands of messages in a single conversation causing the object to grow very large.
But there is one thing I don't know how it works (specifically) in RavenDB. Does its internal storage and query mechanism allow (the DB engine, not the client) to grab just the (for example) 50 most recent messages without reading the whole object? Afterall, it uses indexing on the properties of objects, but I haven't found any information about whether reading parts of an object is possible DB-side. (That is, without the DB engine reading the whole object from disk, parsing it and then sending back just the required parts to the client).
If it is possible, I think using Raven is a better option in this scenario, if not, then I am not sure. So please help me clean it up by answering the issue mentioned in the previous paragraph along with any advices on what DB model would suit this certain scenario the best. RDBMSs? DocDBs? Maybe something else?
Thanks.
Upvotes: 1
Views: 597
Reputation: 197
I would say the primary distinctions will be:
Note also that many modern cloud document databases (like Azure DocDB) can give you the best of both worlds as they support geo-replication, schema-less documents, auto-indexing, guaranteed latencies, and SQL queries. SQL Databases (like AWS Aurora) can handle massive throughput rates, but usually still require more hand-holding from a DBA.
Upvotes: 1