Deepesh
Deepesh

Reputation: 840

Transformation taking too long to execute

with the reference to my previous post,here is the link

I have 130000 records in my source. When I tried running the transformation it was still running after 16 hours.

Will increasing the memory heap of spoon.bat script file help reduce the execution time of my transformation? (Changed from PENTAHO_DI_JAVA_OPTIONS="-Xmx256m -XX:MaxPermSize=256m TO PENTAHO_DI_JAVA_OPTIONS="-Xmx2g -XX:MaxPermSize=256m).

What are other ways to increase performance of the transformation?

Upvotes: 1

Views: 6349

Answers (2)

RASHI
RASHI

Reputation: 11

  1. Avoid sort operation
  2. Avoid Java Script,If possible
  3. One large JavaScript step runs faster than three consecutive smaller steps.So try to combine
  4. Tick "Manage thread priorities" in Misc tab of transformation settings
  5. If possible, don't remove fields in Select Value
  6. Apply no of copies to start refernce:- http://help.pentaho.com/Documentation/5.4/0L0/0Y0/070/030

Upvotes: 1

mzy
mzy

Reputation: 1764

I also needed to speed up a transformation. This is my settings PENTAHO_DI_JAVA_OPTIONS="-Xmx2048m" "-XX:MaxPermSize=1024m".

A final speed depends on a design of a transformation. In general:

  • HW parameters of your machine / server where you run it. (In my case it is about 2 times faster when I run a job on new server than on my laptop). Are there other processes running on a same machine during you run it?
  • Is the transformation optimized? Do you use JavaScript steps a lot? They are slower (try to replace them by another steps). What kind of storage do you use? How many database joins do you use?
  • Have you identified bottlenecks of the transformation? When you run the transformation you can see which steps are slowing it down (bottlenecks) [see Step Metrics tab of Execution results, focus on Speed a Input/output]. E.g. database joins to remote server, merge joins, sort step. You can set more instances for such a step (right click on the step > Change number of copies to start.. > set it to 2 or more > re-run the transformation and see the differences).
  • Use cache options for database lookups.
  • Avoid "slow steps" if possible (those which need to process all rows to create a result): Sort rows, Merge join, Unique rows, Row denormalizer. When the first row comes to such a step it waits all the time until the last row comes. Then the step processes all rows, creates a result and transformation continues.
  • Try to use clustering.

Further reading:

Upvotes: 3

Related Questions