noob
noob

Reputation: 490

Struggling between relational data modeling and document data modeling

I am currently struggling on the data modeling. There are more than 1 million records in some tables and it takes quite a while to query the output via GROUP BY and COUNT. So I moved to Couchbase as it supports views and index which I found faster in querying data.

There is a great advantage in MySQL I found very useful. Such as I have a user in users table and some articles related to this users, also some likes and comments related to this article from many other users. I normally do a JOIN so an output would give me the article with user's name and profile image. The output also attached with other users' details with likes and comments. So if the user upload a new profile image or changed his email address, I only need to update the column in the users table.

In Couchbase, I tried to create documents as I store the data in MySQL such article documents have authors as an user_id and comment documents have commenter_id and article_id. Now I found it is super hard to join them in a view or index with limit and sort enabled. So I copied user's profile_img and first_name and last_name to all related documents. So when I load the article document it has below structure:

{ "article_id": 1234, "text": "A good article", "author_id": 1, "first_name": "John", "last_name": "Smith", "profile_img": "0bf34ee0a.jpg", "likes": [ { "user_id": 1, "first_name": "John", "last_name": "Smith", "profile_img": "0bf34ee0a.jpg" }, { "user_id": 2, "first_name": "Paul", "last_name": "Einstein", "profile_img": "1789ab00ef.jpg" } ] "comments": [ { "user_id": 1, "first_name": "John", "last_name": "Smith", "text": "This is my article", "profile_img": "0bf34ee0a.jpg" }, { "user_id": 2, "first_name": "Paul", "last_name": "Einstein", "text": "i like it", "profile_img": "1789ab00ef.jpg" } ] }

This surely saved my query time. (Otherwise I have to query the articles first, the extract the user ids from the articles and likes and comments and query the users the attach user details to the article and likes and comments according to the user ids). But this created another issue for me is that if user update his profile image, I have to crawl all articles to find his user_id and update the profile_img field.

Anyone has any clues of which way should I go?

Upvotes: 1

Views: 56

Answers (2)

NoSQLKnowHow
NoSQLKnowHow

Reputation: 4865

Read this blog post and see if this answers some of your questions and if not let's keep talking.

For your object model above, embedding the Likes and Comments into the user document would probably be a bad idea in the long run. While yes, you can use the sub-doc API to read/write that one part of the JSON, you pay for that on the back end via replication, etc., but also when it comes to document sizes as time goes by. More than likely you'd be better off putting each user's likes and comments into their own document. Even then, you'd have to content with how that document grew for active users.

One other thing. Should the comments and likes be related to what is being commented on or to the user commenting and liking? It might be worthwhile to have each comment in their own object with a standardized key pattern to identify it and then have another object that is an array of all of those object Ids that commented on that original think. Same for the likes. What you do depends on how you are going to access the data and specifically the performance and scaling needs you have for your application. What I mean by that is a schema design decision you'd make for a system that only get 500 operations a second can be VERY different than for one that does 200,000 a second. The key difference between accessing data in a NoSQL database as compared to a RDBMS is that with NoSQL it is far easier to model data exactly how your application and users will use the data, whereas in an RDBMS you have to model many times on what is best for the DB engine and how it will store and use the data.

Also, go read this post and this one. Keep in mind that the latter post on high write rates was written before N1QL in Couchbase, but it should give you some ideas to think about regardless.

Upvotes: 2

Related Questions