Reputation: 2777

Which approach is best for storing a list of words in mysql that will later be used for statistics?

DETAILS

I have a quiz (let’s call it quiz1). Quiz1 uses the same wordlist each time it is generated. If the user needs to, they can skip words to complete the quiz. I’d like to store those skipped words in mysql and then later perform statistics on them.

At first I was going to store the missed words in one column as a string. Each word would be separated by a comma.

|testid |       missedwords                     | score     |   userid  |
*************************************************************************
| quiz1 | wordlist,missed,skipped,words         |  59       |   1       |
| quiz2 | different,quiz,list                   |  65       |   1       |

The problem with this approach is that I want to show statistics at the end of each quiz about which words were most frequently missed by users who took quiz1.
I’m assuming that storing missed words in one column as above is inefficient for this purpose as I'd need to extract the information and then tally it -(probably tally using php- unless I stored that tallied data in a separate table).

I then thought perhaps I need to create a separate table for the missed words The advantage of the below table is that it should be easy to tally the words from the table below.

|Instance|  missed word     |
*****************************
|   1    |  wordlist        |
|   1    |  missed          |
|   1    |  skipped         |

Another approach I could create a table with tallys and update it each time quiz1 was taken.

Testid  |   wordlist|   missed| skipped| otherword|
**************************************************
Quiz1   |        1  |        1|       1| 0        |

The problem with this approach is that I would need a different table for each quiz, because each quiz will use different words. Also information is lost because only the tally is kept not the related data such which user missed which words.

Question

Which approach would you use? Why? Alternative approaches to this task are welcome. If you see any flaws in my logic please feel free to point them out.

EDIT Users will be able to retake the quiz as many times as they like. Their information will not be updated, instead a new instance would be created for each quiz they retook.

Upvotes: 7

Answers (5)

markus

Reputation: 40685

The best way to do this is to have the word collection completely normalized. This way, analyses will be easy and fast.

quiz_words with wordID, word
quiz_skipped_words with quizID, userID, wordID

To get all the skipped words of a user:

SELECT wordID, word 
FROM quiz_words 
JOIN quiz_skipped_words USING (wordID) 
WHERE userID = ?;

You could add a group by clause to have group counts of the same word.

To get the count of a specific word:

SELECT COUNT(*) 
FROM quiz_words 
WHERE word LIKE '?';

Upvotes: 3

Jon

Reputation: 4736

Keeping as much information as possible (and being able to compile user-specific stats later as well as overall stats now) I would create a table structure similar to:

                  Stats
quizId  |   userId  |     type|    wordId|
******************************************
1       |        1  |   missed|         4|
1       |        1  |  skipped|         7|

Where type can either be an int defining the different types of actions, or a string representation - depending on if you believe it can ever be more. ^^

Then:

     Quizzes
quizId  |  quizName|
********************
       1|    Quiz 1|

With the word list made for each quiz like:

     WordList (pk: wordId)
quizId  |   wordId|   word|
***************************
      1 |       1 |    Cat|
      1 |       2 |    Dog|

You would have your user table however you want, we are just linking the id from it in to this system.

With this, all id fields will be non-unique keys in the stats table. When a user skips or misses a word, you would add the id of that word to the stats table along with relevant quizId and type. Getting stats this way would make it easy as a per-user basis, a per-word basis, or a per-type basis - or a combination of the three. It will also make the word list for each quiz easily available as well for making the quizzes. ^^

Hope this helps!

Upvotes: 1

Stephan

Reputation: 8090

The best solution (from my pov) for what are you trying to achieve is the normalized aproach:

test table which has test_id column and other columns
missed_words table which has id (AI PK) and word (UQ) , here you can also have a hits column that should be incremented each time that a association to this word is made in test_missed_words table this way you have the stats that you want already compiled and you don't need them to be calculated from a select query
test_missed_words which is a link table that has test_id and missed_word_id (composite PK)

This way you do not have redundant data (missed words) and you can extract easily that stats that you want

Upvotes: 1

MadDokMike

Reputation: 182

why not have a quiz table and quiz_words table, the quiz_words table would store id,quizID,word as columns. Then for each quiz instance create records in the quiz_words table for each word the user did use.

You could then run mysql counts on the quiz_words table based on quizID and or quiz type

Upvotes: 1

BlitZ

Reputation: 12168

According to database normalization theory, second approach is better, because ideally one relational table cell should store only one value, which is atomic and unsplitable. Each word is an entity instance.

Also, I might suggest to not create Quiz-Word tables, but reserve another column in Missed-Word table for quiz, for which this word was specified, then use this column as a foreign key for Quiz table. Then you probably may avoid real time table generation (which is a "bad practice" in database design).

Upvotes: 1

Which approach is best for storing a list of words in mysql that will later be used for statistics?

Answers (5)

Related Questions