user3380585
user3380585

Reputation: 171

How to merge 500 million table with another 500 million table

I have to merge two 500M+ row tables.

What is the best method to merge them?

I just need to display the records from these two SQL-Server tables if somebody searches on my webpage.

These are fixed tables, no one will ever change data in these tables once they are live.

create a view myview as select * from table1 union select * from table2 

Is there any harm using the above method?

If I start merging 500M rows it will run for days and if machine reboots it will make the database go into recovery mode, and then I have to start from the beginning again.

Why Am I merging these table?

How we are currently doing it?

We have SQL server 2008 R2

We only load new data once in a month and do the select

Does replication help?

Biggest issue I am facing is managing huge tables.

I hope I explained the situation .

Thanks & Regards

Upvotes: 1

Views: 2326

Answers (5)

Engr.Aftab Ufaq
Engr.Aftab Ufaq

Reputation: 6384

its too late but hope it will help someone.

I have multiple mysql database on different servers of the same data. every database have more then 400 Million rows. it was basically of caller ID mobile application. so we want to create a centerilize database of a server. so firstly I have allowed remote connection to every database. and then write a function in which I take 1 million rows form database1 and save that to css file. and the use LOAD DATA LOCAL INFILE. all the operation have take 5 days.and now I have a centriole database of about 1.8 billion rows. also I have apply indexes on multiple columns so it is easily to search. the response time is about 500 to 700 milli seconds.

here is the nodeJs code of what I have done

async function saveDataFromFile(filePath) {
    return new Promise(async (resolve, reject) => {
        try {
            const results = await sequelize.query(
                `LOAD DATA INFILE '${filePath}' IGNORE  INTO TABLE contactsdb FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (countryId,nationalNumber,personName)`
            );
            resolve({ status: true, results: results[0] });
        } catch (err) {
            reject({
                status: false,
                message: err,
            });
        }
    });
}

Upvotes: -1

Alisa
Alisa

Reputation: 3072

If the two tables are linked one to one, then you are wasting the cpu time a lot for each read. Especially that you mentioned that the tables don't change at all. You should have only one table in this case. Try creating a new table including (at least) the two columns from the two tables. You can do this by:

Select into newTable

    from A left join B on A.x=B.y

or (if some people don't have the information of the text file)

Select into newTable

    from A inner join B on A.x=B.y

And note that you have to have made index on the join fields at least (to speed up the process).

More details about the fields may help giving more precise answer as well.

Upvotes: 2

Stan
Stan

Reputation: 1999

1) Usually developers, to achieve more performance, are splitting large tables into smaller ones and call this as partitioning (horizontal to be more precise, because there is also vertical one). Your view is a sample of such partitions joined. Of course, it is mostly used to split a large amount of data into range of values (for example, table1 contains records with column [col1] < 0, while table2 with [col1] >= 0). But even for unsorted data it is ok too, because you get more room for speed improvements. For example - parallel reads if put tables to different storages. So this is a good choice.

2) Another way is to use MERGE statement supported in SQL Server 2008 and higher - http://msdn.microsoft.com/en-us/library/bb510625(v=sql.100).aspx.

3) Of course you can copy using INSERT+DELETE, but in this case or in case of MERGE command used do this in a small batches. Smth like:

SET ROWCOUNT 10000
DECLARE @Count [int] = 1
WHILE @Count > 0 BEGIN
    ... INSERT+DELETE/MERGE transcation...

    SET @Count = @@ROWCOUNT
END

Upvotes: 3

blorkfish
blorkfish

Reputation: 22844

You may want to have a look at an Indexed View.
In this way, you can set up indexes on your view and get the best performance out of it. The expensive part of using Indexed Views is in the CRUD operations - but for read performance it would be your best solution.

http://www.brentozar.com/archive/2013/11/what-you-can-and-cant-do-with-indexed-views/

https://www.simple-talk.com/sql/learn-sql-server/sql-server-indexed-views-the-basics/

Upvotes: 2

Dave.Gugg
Dave.Gugg

Reputation: 6781

If your purpose is truly just to move the data from the two tables into one table, you will want to do it in batches - 100K records at a time, or something like that. I'd guess you crashed before because your T-Log got full, although that's just speculation. Make sure to throw in a checkpoint after each batch if you are in Full recovery mode.

That said, I agree with all the comments that you should provide why you are doing this - it may not be necessary at all.

Upvotes: 2

Related Questions