arian1123
arian1123

Reputation: 236

Design a MySQL table(s) to store data of uploaded files

Im doing some redesigning of our database im building new tables to hold data about files users uploaded. The overlying issue here is that there are a bunch of different types of files users may upload. They may for example upload an mp3 file as a song, a profile picture, a profile cover photo, etc. Im running into a few design and practical issues, though, and am trying to figure out the best to do this. At the moment the main design looks something like this:

 ID | name | type | amazon_S3_info

ID: auto-increment an ID for every new upload.
name: name of the upload i.e. file name for example
type: what type of upload it is, for example a profile picture, a cover photo, audio file etc.
amazon_S3_info: Im storing all the files in an S3 and this field holds the data so I can generate the URL. I can't store a URL here, since im using signed urls and they always need to be regenerated with the data stored in this field.

After creating a table like this I could then just make matching tables where I, for example, create the relationships between a user ID and the upload ID of the profile picture they uploaded, etc. which is pretty simple.

My original idea was to break this whole thing up into multiple tables, meaning I would make 1 table for profile pictures, 1 for cover photos, etc. The reason this would become a bit of a headache on the php-side is that I have one standard function which uses an ID to retrieve the file URL for these files. If I have multiple tables then each type of upload would have 1 of the same ID in it, thus rendering my current URL retrieval useless. This is already in use all over the site and would be a nuisance to redo, however if it needs to it needs to be.

To be clear the idea here to break up into a few tables was speed. My logic is that it would more efficient to break a single table that could be 2,000,000 rows into a 4 tables of 500,000. It would be quicker to pull data from each one of those 500,000 rows table, or is that a false premise?

So my question to you lot is which database design is better, particularly when we are talking about scaling to be quite large?

Upvotes: 1

Views: 2226

Answers (1)

gimbel0893
gimbel0893

Reputation: 382

With databases (and computers in general), you're usually worried about factors of 10, not just 2x or 3x.

So splitting up the table by type into multiple tables, say 5 tables altogether instead of 1, ultimately won't solve your performance issues once the data grows extremely large. And like you said, it's a programming pain. (Basically you'd be sharding manually without an algorithm...if going to shard might as well use a hash shard algorithm to find the database/table).

The design you have is standard many to many. Index the tables correctly and that's the best you can do.

If performance becomes a problem, you need to scale horizontally. Relational datastores don't do this well, but NoSQL datastores do. You can have those types of references in NoSQL as well. If design changes are still possible, look into AWS DynamoDB (NoSQL service).

Edit: to respond to a comment...

@arian1123 In my experience, there's a point (table size) where all of a sudden mysql starts performing poorly. The more hardware (especially memory) you have, the larger the tables can grow before this happens. (The killer are joins. If you don't join big tables on big tables, then a big table by itself probably can grow very large with adequate hardware, I've dealt with 1Billion+ row tables where only reads were done with no joins, and it wasn't a problem.)

On your own laptop, you may see 100k tables performing fine, and 1M tables not. If the data isn't going to grow anymore and that's the power of hardware you'll have on production, then splitting would be a good idea. However if you're going to always be increasing table size, like 50M as you mention, then splitting it up would only help if you could split indefinetly (like every 2 million rows you divide the table again). In you're case you aren't wanting to continue to divide 1 table to 4 to 20 to 100...so I think it'd be better to leave as 1 table, and if it doesn't perform then look into other datastore types.

Upvotes: 3

Related Questions