Reputation: 2447
I'd like to transfer a large amount of data from SQL Server to MongoDB (Around 80 million records) using a solution I wrote in C#. I want to transfer say 200 000 records at a time, but my problem is keeping track of what has already been transferred. Normally I'd do it as follows:
Gather IDs from destination to exclude from source scope
Read from source (Excluding IDs already in destination)
Write to destination
Repeat
The problem is that I build a string in C# containing all the IDs that exist in the destination, for the purpose of excluding those from source selection, eg.
select * from source_table where id not in (<My large list of IDs>)
Now you can imagine what happens here when I have already inserted 600 000+ records and then build a string with all the IDs, it gets large and slows things down even more, so I'm looking for a way to iterate through say 200 000 records at a time, like a cursor, but I have never done something like this and so I am here, looking for advice.
Just as a reference, I do my reads as follows
SqlConnection conn = new SqlConnection(myConnStr);
conn.Open();
SqlCommand cmd = new SqlCommand("select * from mytable where id not in ("+bigListOfIDs+")", conn);
SqlDataReader reader = cmd.ExecuteReader();
if (reader.HasRows)
{
while (reader.Read())
{
//Populate objects for insertion into MongoDB
}
}
So basically, I want to know how to iterate through large amounts of data without selecting all that data in one go, or having to filter the data using large strings. Any help would be appreciated.
Upvotes: 2
Views: 288
Reputation: 9610
There are many different ways of doing this, but I would suggest first that you don't try to reinvent the wheel but look at existing programs. There are many programs designed to export and import data between different databases, some are very flexible and expensive, but others come with free options and most DBMS programs include something.
Option 1:
Use SQL Server Management Studio (SSMS) Export wizards.
This allows you to export to different sources. You can even write complex queries if required. More information here:
https://www.mssqltips.com/sqlservertutorial/202/simple-way-to-export-data-from-sql-server/
Option 2:
Export your data in ascending ID order. Store the last exported ID in a table.
Export the next set of data where ID > lastExportedID
Option 3:
Create a copy of your data in a back-up table. Export from this table, and delete the records as you export them.
Upvotes: 0
Reputation: 259
Need more rep to comment, but if you sort by your id column you could change your where clause to become
select * from source_table where *lastusedid* < id and id <= *lastusedid+200000*
which will give you the range of 200000 you asked for and you only need to store the single integer
Upvotes: 1