ebro
ebro

Reputation: 67

Migrate from SQL to MongoDB?

I am researching on the possibilities of migrating data from SQL 2012 to mongoDB. And my manager specifically asked me to see the time it takes to process billions of rows in SQL and MongoDB to make a decision to migrate or not. Any recommendations or suggestions or places I should visit to research more? So far I have done

  1. installed MongoDB in my development environment
  2. i have been able to connect to MongoDB, created Databases and collections

Questions I have now 3. how to import the database in SQL to Mongo (say migrate Adventure Works)

Thanks In Advance!

Upvotes: 4

Views: 14560

Answers (2)

Ashley Davis
Ashley Davis

Reputation: 10040

I've created a Node.js script that replicates an SQL database to MongoDB.

You can find it here.

To use...

Clone the repo:

git clone https://github.com/ashleydavis/sql-to-mongodb

Install dependencies:

cd sql-to-mongodb
npm install

Setup your config:

Edit config.js. Add your SQL connection string and details for your MongoDB database.

Run the script:

node index.js

This can take a while to complete! But when it does you will have a copy of your SQL database in MongoDB. Please let me know if there are any issues.

Upvotes: 5

Markus W Mahlberg
Markus W Mahlberg

Reputation: 20703

Some best practices I learned the hard way.

Do a partial import

When planning a MongoDB cluster, you need to have an idea how big the average document size is. In order to do that, import some 10k records of your data. This gives you an idea on how long the actual import will take in orders of magnitude:

time to migrate

where t is the time it took to import n documents of c.

Repeat this for all target collections. After that, issue a

db.stats()

in the mongo shell. You will be presented with some size statistics. You now have approximations to two key factors: the time it takes to import (by summing up the results of above calculation) and the storage space you will need.

Create the indices on the partial import

Create the indices you are going to need. As for time calculations, the same as above applies. But there is a different thing: indices should reside in RAM, so you need to extrapolate the actual RAM you need when all records are migrated.

Chances are that it isn't cost efficient to store all data on one machine, since RAM is getting costly after a certain point (calculation are necessary here). If that is the case, you need to shard.

When sharding: Choose a proper shard key

It can not be overemphasized how important it is to have a proper shard key right from the start: Shard keys can not be changed. Invest some time with the developers to find a proper shard key.

When sharding: Pre-split chunks

The last thing you want during data migration is to have it being delayed by the balancer trying to balance out the chunks. So you should pre-split your chunks and distribute them among your shards.

Upvotes: 4

Related Questions