Reputation: 1903
Good day,
I have a kettle pentaho
file that run as a batch job.
Basically, this files contain of 2 main steps,
First step, read from a input file (txt
file) and store inside table1
.
Second step, same as first step, read from same input file and store inside table2
.
This batch is working fine until I put in a 20MB input file.It require more than 7hours to finish the job.
Below is some test case I have done:
15360 records, 1.4MB, 2 minutes and 20 seconds (140 seconds total).
30720 records, 2.8MB , 7 minutes and 30 seconds (450 seconds total)
61440 records, 5.5MB, 26 minutes and 55 seconds (1615 seconds total).
250000 records, 20MB, 7 hours and 30 minutes
In the log, I found there is some steps that occupied most of the time consuming. Which are as follow: 1. Text file input. 2. Select values. 3. Modified Java Script Value.
Both main steps also contain this 3 kettle pentaho function. For 20MB input file, first step only take around 7 minutes, but second step take more than 7 hours.
Try to look at it in quite long time, still cant find out what is the problem.
Kindly advise.
Upvotes: 0
Views: 1444
Reputation: 3968
There might be multiple reasons (i assume). First of all, try to optimize steps like "Select Values" and "Modified JavaScript". Some of the performance tuning tips are given in here.
Also you may try to increase the Java Memory in the pan.sh
. check the image below:
Change the JAVAMAXMEM to somevalue higher like 1024.
Hope these changes might help :)
Upvotes: 1