RachelD
RachelD

Reputation: 4089

MySQL Whats better for speed one table with millions of rows or managing multiple tables?

Im re working an existing PHP/MySql/JS/Ajax web app that processes a LARGE number of table rows for users. Here's how the page works currently.

  1. A user uploads a LARGE csv file. The test one I'm working with has 400,000 rows, (each row has 5 columns).
  2. Php creates a brand new table for this data and inserts the hundreds of thousands of rows.
  3. The page then sorts / processes / displays this data back to the user in a useful way. Processing includes searching, sorting by date and other rows and re displaying them without a huge load time (thats where the JS/Ajax comes in).

My question is should this app be placing the data into a new table for each upload or into one large table with an id for each file? I think the origional developer was adding seperate tables for speed purposes. Speed is very important for this.

Is there a faster way? Is there a better mouse trap? Has anyone ever delt with this?

Remember every .csv can contain hundreds of thousands of rows and hundreds of .csv files can be uploaded daily. Though they can be deleted about 24 hrs after they were last used (Im thinking cron job any opinions?)

Thank you all!

A few notes based on comments:

Upvotes: 1

Views: 457

Answers (2)

Brian Agnew
Brian Agnew

Reputation: 272237

Given 105 rows and 102 CSVs per day, you're looking at 10 million rows per day (and you say you'll clear that data down regularly). That doesn't look like a scary figure for a decent db (especially given that you can index within tables, and not across multiple tables).

Obviously the most regularly used CSVs could be very easily held in memory for speed of access - perhaps even all of them (a very simple calculation based on next to no data gives me a figure of 1Gb if you flush every over 24 hours. 1Gb is not an unreasonable amount of memory these days)

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269563

I would shy away from putting all the data into a single table for the simple reason that you cannot change the data structure.

Since the data is being deleted anyway and you don't have a requirement to combine data from different loads, there isn't an obvious reason for putting the data into a single table. The other argument is that the application now works. Do you really want to discover some requirement down the road that implies separate tables after you've done the work?

If you do decide on a single table, then use table partitioning. Since each user is using their own data, you can use partitions to separate each user load into a separate partition. Although there are limits on partitions (such as no foreign keys), this will make access the data in a single table as fast as accessing the original data.

Upvotes: 2

Related Questions