Reputation: 2759
I have a table which has around 25-30 million rows and 15 columns. most of them are static or gets realtime updates. But for some of the columns(3 to 4) I get the data from hdfs once in a day and updates it which means these colums have to get updated once in a day .
I tried creating a new table from the hdfs output and doing join on the main table but that seems to be taking forever. I tried doing batch updates to it but that too is taking like too long.
What is the best way to do this ?
I am using postgres as my db
Upvotes: 0
Views: 308
Reputation: 895
I would try something like creating a new table using CREATE TABLE AS ... command , using a join between the new table and the table you are creating with the batch data, instead of updating the old table.
http://www.postgresql.org/docs/9.4/static/sql-createtableas.html
If you place that inside a transaction, you can place it in a function, and after that you can just rename the tables to switch them or just remove the old one and rename the new one.
I have seen processes where this strategy works like a clock for batch jobs (creating a new table being faster than updating the old one).
You will need to test it and see if that suits for your tables, and also look at the indexes you have.
Best of luck!
note: you will also have to add the indexes and constraints to the new table as the create table as will not include them.
Upvotes: 0
Reputation: 1967
Best way to improve a performance of update/insert query is by using a Bulk Insert/Update
Please have a look at posts below which will help you:
Does splitting up an update query improve performance
Upvotes: 2