Reputation: 21
I have built one SSIS package to load data from CSV files to a database table. The CSV files are first downloaded from Azure blob using a power shell script and then each of these files is loaded to a target table in SQL Server.
So I setup a ForEach Enumerator to loop through all the files and load data to the target table but the process is too slow. Each file has just one row of data (around 30 columns) and so to load say 20k rows I have to loop through 20k files and the package takes HOURS to run.
I tried looking for alternatives to load data from multiple files but couldn't find any concrete solution.One guy Hilmar has an interesting solution to use script task to improve performance but I don't have any C# know-how what so ever.
Has anyone run into a similar problem or overcome the same ? Or if anyone has a sample to load multiple files using a script task, it would help a lot (?)
Any help is appreciated.
Upvotes: 1
Views: 1309
Reputation: 21
Kind of feel a little outwitted here. I deployed my package to Integration Services and scheduled a run for same via SQL Agent.
Guess what !! A package that took 12 hours to load 6k files, now loads 20k files in under 30 mins. Never ever would I have thought that executing a package in SSDT and executing it on server would have such contrasting results.
Now I am not sure as to what the exact reason for this could be but I guess the time wasted in logging all the execution results in SSDT could have made a big difference.
Will search for the exact reason for such behaviour but this significant decrease in execution time is acceptable to me as I don't have a large number of files to load data from every day.
Would have gone for concatenating the files option had I not needed the original files as we have added a mail task now to send files with errors(truncation/data) back to the dev team.
Thanks for the help though @everyone.
Upvotes: 0
Reputation: 3586
To conclude the comments conversation here is a script Merging multiple CSV files into one using PowerShell to load all the data in one go (assuming all the files are of the same format) with a tiny tweak to traverse subfolders and append caret return to the end of each file:
if (Test-Path "COMBINED_FILE.csv") {Remove-Item "COMBINED_FILE.csv"} $getFirstLine = $true Get-ChildItem "SOURCE_ROOT_FOLDER\*.csv" -Recurse -File | foreach { $filePath = $_.FullName $lines = Get-Content $filePath $linesToWrite = switch($getFirstLine) { $true {$lines} $false {$lines | Select -Skip 1} } + [System.Environment]::NewLine $getFirstLine = $false Add-Content "COMBINED_FILE.csv" $linesToWrite }
Upvotes: 1