LoveTW
LoveTW

Reputation: 3832

Dummy step is not work in Job

Each transformation will create an csv file in a folder, and I want to upload all of them when transformations done. I add a Dummy but the process didn't work as my expectation. Each transformation will execute Hadoop Copy Files step. Why? And how could I design the flow? Thanks.

enter image description here

Upvotes: 1

Views: 1845

Answers (2)

nsousa
nsousa

Reputation: 4544

You cannot join the transformations like you do.

Each transformation, upon success, will follow to the Dummy step, so it'll be called for EVERY transformation.

If you want to wait until the last transformation finishes to run only once the Hadoop copy files step you need to do one of two things:

  1. Run the transformations in a sequence, where each ktr will be called upon success of the previous one (slower)

  2. As suggested in another answer, launch the KTRs in parallel, but with one caveat: they need to be called from a sub-job. Here's the idea:

Your main job has a start, calls a sub-job and upon success, calls the Hadoop copy files step.

Your sub-job has a start, from which all transformations are called in different flows. You use the "Launch next entries in parallel" so all are launched at once.

The sub-job will keep running until the last transformation finishes and only then the flow is passed to the Hadoop copy files step, which will only be launched once.

Upvotes: 1

Rishu S
Rishu S

Reputation: 3968

First of all, if possible, try launching the .ktr files in parallel (right click on the START Step > Click on Launch Next Entries in parallel). This will ensure that all the ktr are launched parallely.

Secondly, You can choose either of the below steps depending upon your feasibility (instead of dummy step):

  1. "Checks if files exist" Step: Before moving to the Hadoop step, you can do a small check if all the files has been properly created and then proceed with your execution.
  2. "Wait For" Step: You can give some time to wait for all the step to complete before moving to the next entry. I don't suggest this since the time of writing a csv file might vary, unless you are totally sure of some time.
  3. "Evaluate files metrics" : Check the count of the files before moving forward. In your case check if the file count is 9 or not.

I just wanted to do a some sort of checking on the files before you copy the data to HDFS.

Hope it helps :)

Upvotes: 2

Related Questions