Reputation: 896
I have multiple Terabyte files that needs to be loaded into a database which sits on top of a high performance AZURE SQL server in cloud.
For now i'm trying to load these files via an SSIS package and its taking more than 12 hours to complete for 5 files.
I believe HDInsight/ Data Bricks are in Azure to do big data ETL process and analyze data using Ambari and other UI. But is it possible to use the same(HDInsight or DataBricks) to load the huge data files into a SQL table/database ? (Like using clusters to do load mutiple files in a parallel execution mode)
Any suggestion/help is much appreciated
Upvotes: 0
Views: 221
Reputation: 1806
Since you mentioned SSIS , I was wondering if you have considered the option of using Azure data factory ( personally I consider that to be the next version of SSIS on cloud ) ,the copy activity should do the trick and it does support parallel execution . Since you are considering the SQL Azure , we need to consider the congestion issue on the sink side , i meant the scenario where all the terabytes of files try to write to the SQL table at the same time .
Upvotes: 1