Reputation: 2043
I have big amount (over 200k) of pdf files in a remote drive like \remote\location. I have to read all file names from that directory and insert the file names into a database.
I have tried "get file names" step. However it is not loading the file names and the transformation is getting stopped immediately.
I have tried with smaller number of records which are in the same remote directory, but in a sub directory. it is working fine.
However when I tried for all files(including sub directories) it is crashing. running into out of memory. (Failed to execute runnable (java.lang.OutOfMemoryError: Java heap space))
Is there a way that I can process for each 1000 files once?
Upvotes: 3
Views: 4281
Reputation: 3294
You ran out of memory. Edit the spoon.sh file and search for this line.
PENTAHO_DI_JAVA_OPTIONS="-Xmx512m -XX:MaxPermSize=512m"
if you have 4gb of memory available you can set 2gb, (it's up to you).
PENTAHO_DI_JAVA_OPTIONS="-Xmx2048m -XX:MaxPermSize=1024m"
restart your spoon and try again.
Upvotes: 2
Reputation: 8068
Kettle is very memory hungry. For example, I typically need 8 GB to run a relatively long and complex process on files of just 250,000 records. So before I run kitchen or pan I always set JAVAMAXMEM
appropriately high. You set it in units of MB, so for 4 GB you'd set
JAVAMAXMEM=4096 kitchen.sh ...
Upvotes: 0