Nishant
Nishant

Reputation: 41

lots of columns in a table

A table in our schema is already having 100 columns.We need to add some 600 more columns if we follow horizontal data storage method. If we go for vertical data storage which means creating new table and creating referential integrity with table having 100 columns, there will be issue with joining the table as the table having 100 columns is having 53 million records and new table created will be having much more than that. Which is better approach in this case.

I would like to add an interesting test case here . I added 600 column to my table already having 87 column and 53 million records. I then tried to update it in batches

a>Time taken to update 1000 records >> 2.10 secs b>Time taken to update 10000 records >> 5.57 secs c>Time taken to update 1000000 records >> 5.42 mins d>Time taken to update 53 million records >> 4. 5 hrs (the table space exhausted and we needed to extend the table space)

Can anyone suggest a faster method of update?

Upvotes: 4

Views: 1582

Answers (5)

Stefan Steinegger
Stefan Steinegger

Reputation: 64628

Without offending anyone... I wonder if your data that is stored in 100 columns times 53 million records is really normalized?

If not, you really should start doing this. You could probably reduce the number of rows a lot (for instance, it could probably be split into three tables of 1000 and 1000 and 53 records. I know, it is not as easy as that, just to show how small the numbers theoretically could be). Most probably there is still a 53 million record table around after normalization, but this could hopefully be kept small, it could even only consist of foreign keys. Usually, you never need all the data at once. Ideally you could perform many queries on tables with just a few thousand records.

Don't be too afraid about joins if you normalize. At the end, it will be faster anyway. There are exceptions indeed.

Upvotes: 1

invert
invert

Reputation: 2076

Edit: This is actually a very interesting question, I'm curious about it now. I suggest do some real world tests, one big table vs many tables, with as much data as possible. It might be worth the extra effort! Remember, even relational DB's can hang if designed poorly and there are millions of records (I experienced this while contracted to an underwriting company, not an easy thing to fix after the fact). So your one-table design could also work - the proof's in the testing.

53 million records? I hope you're using a real relational database engine like MySQL/SQL, they are designed to handle big tables.

600+ Columns in a single table sounds like overkill to me. I assume its not a one-to-many record structure, which is why you're opting for the all-in-one-row approach? Even so it might be a better idea to have separate tables, depending on what your data is.

Upvotes: 1

theglump
theglump

Reputation: 11

You might consider using a column oriented database, take a look at HBase (http://hadoop.apache.org/hbase/), it's a distributed, column-oriented store modeled after Google's big table.

Upvotes: 0

dnagirl
dnagirl

Reputation: 20456

Questions you need to ask yourself:

  • Do most of the fields in my wide row have default or empty values? If this is the case, a vertical schema may be more suitable.
  • When you query, do you usually need to retrieve all the fields from a row, or do fields naturally classify into groups? If this is the case, a horizontal schema is likely fine, but you'll probably want to chop your main table into subtables, each with a natural group of fields, and all in a 1:1 relationship with the main table.

Upvotes: 2

Thomas Weller
Thomas Weller

Reputation: 11717

Highly depends on the nature of your data and how they are used.

Maybe it is appropriate to write your data to an xml document and then store the document in the db...

Upvotes: 0

Related Questions