Rapsey
Rapsey

Reputation: 589

Storing arrays of integers in database

I am creating a database that will store 100.000 (and probably more in the future) users. While this obviously happens in a table with 1 row per user, every user can (and will) store hundreds of items. In programming language this would mean the user has 2 arrays (or one 2-dimensional array) of integers: a column for the itemid's and a column for the amounts.

My instincts tell me to create a table to hold all these items, with rows like (userid, itemid, amount). However this would result in a huge table. 200.000 users with 250 items each... that's 50 million entries in one table. This, plus the fact that the table will undergo continuous and rapid change, frightens me. (How rapid? I estimate up to 100 modifications per second.)

Typically there will be anywhere between 100 and 2000 users, all adding and removing items, and modifying amounts. These actions can and will happen in programming code. It would go as follows:

It is worth noting that there is a maximum to the number of items a user can store.

Are there any alternatives to using a separate table? Perhaps save the values in a formatted text string? Or is this one of the instances where using a MySQL database is actually a Bad Idea™?

Thank you for your time and insights.

Upvotes: 2

Views: 557

Answers (3)

HLGEM
HLGEM

Reputation: 96572

Putting data into one field as a array is alwmost always a mistake. It makes querying the data much harder and much more timeconsuming as well as much less likely to use indexes. It is ok, if the values were just text where you would never need to find one or more elements fo the array but it is my experience that this situation is rarely encountered. Modern databases can handle 50 million records without even breaking a sweat. That's a small table in daatbase terms.

Upvotes: 1

symcbean
symcbean

Reputation: 48357

My instincts tell me to create a table to hold all these items

Your instincts are right.

1) avoid premature optimisation

2) don't break the rules of normalization unless you've got a very good and real reason to do so

3) why do you suspect that the multi-table approach will be faster?

that's 50 million entries in one table

So what? Even if you only have an index on userid, the difference in performance compared with a single table per user will not be noticeably slower (in practice, with 200,000 users, it will be much, much faster - since the DBMS can comfortably keep an open file handle for each table!).

I estimate up to 100 modifications per second

Should be possible using MySQL and fairly basic hardware, but if it were me, and I wanted a bit of headroom, I'd go with a pair of mirrored SATA disks, tables on one mirror, indexes on the other.

The only issue I'd be concerned about (which applies regardless of which of the 2 models you choose) is supporting 2000 concurrent connections. Do the connections have to be concurrent? Or can each user download a working set (optionally using an optimistic locking strategy) and close off the connection, then push back the changes on a new connection? If not, then you'll probably want a good whack of memory and CPU.

But leaving aside whether to use one big table or lots of little ones, if this is the only use for the data, and access is not concurrent to particular data items, then why bother with a relational database at all? NoSQL or a shared filesystem might work just as well.

Upvotes: 5

Khaled
Khaled

Reputation: 1194

It should be OK to do it as you described using two tables. The database should be able to handle millions of records.

The important points to look at:

1- Optimize your queries as much as possible.

2- Create the appropriate index(es) to speed up your queries.

3- Use InnoDB if you have concurrent read/update operations as it supports row-level locking as opposed to MyISAM.

4- Provide good hardware to support the database server.

5- Run the database server on a dedicated server if affordable.

Upvotes: 0

Related Questions