user2482329
user2482329

Reputation: 31

SSIS processing large amount of flat files is painfully slow

From one of our partners, I receive about 10.000 small tab delimited text files with +/- 30 records in each file. It is impossible for them to deliver it in one big file.

I process these files in a ForEach loop container. After reading a file, 4 column derivations are performed and then finally contents are stored in a SQL Server 2012 table.

This process can take up to two hours.

I already tried processing the small files into one big file and then importing this one in the same table. This process takes even more time.

Does anyone have any suggestions to speed up processing?

Upvotes: 3

Views: 5603

Answers (2)

billinkc
billinkc

Reputation: 61201

One thing that sounds counter intuitive is to replace your one Derived Column Transformation with 4 and have each one perform a single task. The reason this can provide performance improvement is that the engine can better parallelize operations if it can determine that these changes are independent.

You might be running into network latency since you are referencing files on a remote server. Perhaps you can improve performance by copying those remote files to the local box before you being processing. The performance counters you'd be interested in are

  • Network Interface / Current Bandwidth
  • Network Interface / Bytes Total / sec
  • Network Interface / Transfers/sec

The other thing you can do is replace your destination and derived column with a Row Count transformation. Run the package a few times for all the files and that will determine your theoretical maximum speed. You won't be able to go any faster than that. Then add in your Derived column and re-run. That should help you understand whether the drop in performance is due to the destination, the derived column operation or the package is running as fast as the IO subsystem can go.

Upvotes: 2

Kyle Hale
Kyle Hale

Reputation: 8120

Do your files offer an easy way (i.e. their names) of subdividing them into even (or mostly even) groups? If so, you could run your loads in parallel.

For example, let's say you could divide them into 4 groups of 2,500 files each.

  1. Create a Foreach Loop container for each group.
  2. For your destination for each group, write your records to their own staging table.
  3. Combine all recordss from all staging tables into your big table at the end.

If the files themselves don't offer an easy way to group them, consider pushing them into subfolders when your partner sends them over, or inserting the file paths into a database so you can write a query to subdivide them and use the file path field as a variable in the Data Flow task.

Upvotes: 0

Related Questions