mccraveiro
mccraveiro

Reputation: 55

RethinkDB Survey Modelling

I'm starting a new project (a survey application) and I chose RethinkDB as my database; however, I have some questions about data modelling. I'm going to have questions that can be answered only once by each user. Moreover I'll have reports that will tell the percentage of users that chose each option. At first I thought of the following modelling:

{
  title: String,
  total_answers: Number,
  options: [{
    value: Number,
    label: String,
    respondents: [User IDs]
  }]
}

The problem is that RethinkDB suggests embedding only a few hundred items on an Array and I'll probably have 500+ respondents per survey. The other option is to create an answers table and link to the question id and option but reports queries may be a problem as I'll have many questions per survey.

Which path should I follow? Thanks!

Upvotes: 0

Views: 124

Answers (1)

kureikain
kureikain

Reputation: 2314

First of all, this is a classic of data modeling between embedding vs join. Since it's classic, let recall what we already have, for reference:

Before we go ahead, let's agree that each solution has its own strong and weakness.

Now back to your question, as you wrote, embedding has its own issues. Embedding requires load the whole document into memory. Any queries you run on it will loaded the whole document. Also, when you change to the options.respondents array, RethinkDB will rewrite the whole document. You also will have many users answer the survey, which will be added to options.respondents at the same time. That means lots of write.

In my opinion, embedded is good for data that don't need to stand on it own, in the scope of application. Meaning the data that always to be used with its parent, and rarely we need access that own data separatly, it's good to embed.

For the data that needs frequent access, on its own, should belong to other table. And using JOIN to run reports, merge result.

As you write, you do want to run queries report, that a sign that you should put it separate. It gives great flexibility because you have thing on its own table, you don't have to dig into the array, and transform the data.

RethinkDB supports JOIN, and you can use eqJoin with index, or concatMap and getAll, also with index, make query more efficient. For you use case, I will say let go with JOIN.

Separating thing out maybe easier to run some aggregation. Such as count the number of users in system participate in surverys in first quarter of the year.

I still don't have an clear of idea of the kind of data that you have, if you can update your question about what kind of data you want to put in, I can help to create a data model around it.

Upvotes: 2

Related Questions