Reputation: 165
I want to add a new column to a table which already consists billions of rows. The new columns are derived from existing columns.
For example,
new_col1 = old_col1 + old_col2
new_col2 = old_col1 / old_col2
I am trying to do this in following way -
Add new columns
ALTER TABLE table_name
ADD ( column_1 column-definition,
column_2 column-definition,
...
column_n column_definition )
Read rows one by one from the table and fill the values for new columns.
There is no primary key in the database. So I can not refer to an individual row. To read rows one by one, I have to do a select * which would give a huge resultset (considering billions of records).
Is there any better way to do this?
Upvotes: 3
Views: 2437
Reputation: 19842
Different DBMS have different SQL dialects, it is useful to specify which you are using in the question.
In SQL Server you could use a Computed Column but this would calculate the result every time you select the data, you could flag it as persisted but it may take a while to make the change. But you can't do that if you are going to remove the old columns.
Alternatively create the new column allowing nulls and then update it in batches
UPDATE TOP (1000) table_name SET new_col1 = old_col1 + col_col2 WHERE new_col1 IS NULL
Again the query is for SQL Server, but there will alternatives for your DBMS.
Also read Mr Hoopers comment about adding an index to the new column to make sure that the performance of the UPDATE
doesn't get worse as more data is added. The update is a read and write operation, the index will speed up the reads and slightly delay the writes (maintaining the index), but it should be worthwhile.
Upvotes: 2
Reputation: 22084
I think Mr Diver's method would be fine if you also added an index on one of your new columns; otherwise, as the job progresses, it will have to do more and more scanning to find the rows it hasn't already updated. Adding an index will mean it doesn't have to do that. A possible drawback is that the index differentiation will be frightful when the column is created, but I don't think that would be a problem as you only care about NULL or NOT NULL. You could drop the index when the update is complete.
Upvotes: 1
Reputation: 3254
use stored procedures, do an update by 100 of them, add the stored procedure as a job to run every say 30 seconds.
Upvotes: 0