Reputation: 117
i'm currently working on a project where the client has handed me a database that includes a table with over 200 columns and 3 million rows of data lol. This is definitely poorly designed and currently exploring some options. I developed the app on my 2012 mbp with 16gb of ram and an 512 ssd. I had to develop the app using mvc4 so set up the development and test environment using parallels 8 on osx. As part of the design, I developed an interface for the client to create custom queries to this large table with hundreds of rows so I am sending a queryString to the controller which is passed using dynamic linq and the results are sent to the view using JSON (to populate a kendo ui grid). On my mbp, when testing queries using the interface i created it takes max 10 secs (which find too much) to return the results to my kendo ui grid. Similarly, when I test queries directly in sql server, it never takes really long.
However when I deployed this to the client for testing these same queries take in excess of 3 mins +. So long story short, the client will be upgrading the server hardware but in the mean time they still need to test the app.
My question is, despite the fact that the table holds 200 columns, each row is unique. More specifically, the design is:
PK-(GUID) OrganizationID (FK) --- 200 columns (tax fields)
If I redesign this to:
PK (GUID) OrganizationID (FK) FieldID(FK) Input
Field table: FieldID FieldName
This would turn this 3 million rows of data table into 600 million rows but only 3 columns. Will I see performance enhancements?
Any insight would be appreciated - I understand normalization but most of my experience is in programming.
Thanks in advance!
Upvotes: 2
Views: 2338
Reputation: 1269503
It is very hard to make any judgements without knowing the queries that you are running on the table.
Here are some considerations:
For testing purposes, just reduce the size of the table. That should speed things up.
As for your question about normalization. Your denormalized structure takes up much less disk space than a normalized structure, because you do not need to repeat the keys for each value. If you are looking for one value on one row, normalization will not help you. You will still need to scan the index to find the row and then load the row. And, the row will be on one page, regardless of whether it is normalized or denormalized. In fact, normalization might be worse, because the index will be much larger.
There are some examples of queries where normalizing the data will help. But, in general, you already have a more efficient data structure if you are fetching the data by rows.
Upvotes: 1
Reputation: 39767
You can take a paging approach. There will be 2 queries: initial will return all rows but only a column with unique IDs. This array can be split into pages, say 100 IDs per page. When user selects a specific page - you pass 100 ids to the second query which this time will return all 200 columns but only for requested 100 rows. This way you don't have to return all the columns across all the rows at once, which should yield significant performance boost.
Upvotes: 0