Reputation: 63
I currently have an etl job that reads source table with over 1 million records and then sequentially processing to target table. Both source and target are in same schema but in between there is an external rest endpoint call to post some data from the source table and this job is performing very bad right now and Can someone please let me know what are some ways to improve performance in terms of how to parallelize this or reducing fetchsize etc to reduce this job running time ?
Upvotes: 0
Views: 229
Reputation: 5164
Check if your rest endpoint supports batching, and then implement that. Most APIs do these days. (In this case, you send multiple requests in one json/xml file to the end point)
Otherwise you simply need to use multiple copies of the REST client step. you should be able to get away with 8-10 at least, but check that you're not limited in some way at the other end.
Finally if none of that helps, try concocting your own httpclient in the java class step (not the javascript) and be sure that you only authenticate with the rest endpoint once, not every request, by keeping the session open. I'm not 100% convinced the rest client does this, and authentication is often the most expensive bit.
Upvotes: 1