Database too large - store as a row or serialise data?

Question

I have Quiz App that constitutes many Modules containing Questions. Each question has many Categories (many-to-many). Every time a quiz is completed, the user's score is sent to the Scores Table. (I've attached an entity-relation diagram for clarification purposes).

I have been thinking of breaking down the user scores according to categories (i.e. a user when completing a quiz will get an overall quiz score along with score for each category).

However, if each quiz consists of at least 30 questions, there could around 15-20 categories per quiz. So if one user completes a quiz, then it would create a minimum of 15-20 rows in the scores table. With multiple users, the Scores table would get really big really fast.

I assume this would affect the performance of retrieving data from the Scores table. For example, if I wanted to calculate the average score for a user for a specific category.

Does anyone have a better suggestion for how I can still be able to store scores based on categories?

I thought about serialising the JSON data, but of course, this has its limitations.

camba1 · Accepted Answer

The DB should be able to handle millions of rows and there is nothing inherently wrong with your design. A few things I would suggest:

Put indexes in the following (or combinations of) user id, exam id (which I assume is what you call scorable id ) exam type (scorable Type?) and creation date.
As your table grows, partition it. Potential candidates could be creation date buckets (by year or year/month would probably work well) or maybe if students are in particular classes you could have class buckets
As your table grow even more you could move the partitions to different different disks (how you partitioned the data will be even more crucial here because if the data has to go across too many partitions you may end up hurting performance instead of helping)

Beyond that another suggestion would be to break the scores table into two score and scoreDetail. The score table would contain top level stuff like user id ,exam id, overall score, etc... While the child table would contain the scores by category (philosophy, etc....). I would bet 80% of the time people only care about the top score anyways. This way you only reach out to the bigger table when some one wants to get the details of their score in a particular exam.

Finally, you probably want to have the score by category in rows rather than columns to make it easier to do analysis and aggregations, but this is not necessarily a performance booster and really depends on how you plan to use the data.

In the end though, the best optimizations really depend on how you plan to use your data. I would suggest just creating a random data set that represents a few years worth of data and play with that.

Database too large - store as a row or serialise data?

Answers (2)

Related Questions