Vissu
Vissu

Reputation: 2043

Pentaho data integration "get file names" not loading big list of files

I have big amount (over 200k) of pdf files in a remote drive like \remote\location. I have to read all file names from that directory and insert the file names into a database.

I have tried "get file names" step. However it is not loading the file names and the transformation is getting stopped immediately.
I have tried with smaller number of records which are in the same remote directory, but in a sub directory. it is working fine.
However when I tried for all files(including sub directories) it is crashing. running into out of memory. (Failed to execute runnable (java.lang.OutOfMemoryError: Java heap space))

Is there a way that I can process for each 1000 files once?

Upvotes: 3

Views: 4281

Answers (2)

Otto
Otto

Reputation: 3294

You ran out of memory. Edit the spoon.sh file and search for this line.

PENTAHO_DI_JAVA_OPTIONS="-Xmx512m -XX:MaxPermSize=512m"

if you have 4gb of memory available you can set 2gb, (it's up to you).

PENTAHO_DI_JAVA_OPTIONS="-Xmx2048m -XX:MaxPermSize=1024m"

restart your spoon and try again.

Upvotes: 2

Gordon Seidoh Worley
Gordon Seidoh Worley

Reputation: 8068

Kettle is very memory hungry. For example, I typically need 8 GB to run a relatively long and complex process on files of just 250,000 records. So before I run kitchen or pan I always set JAVAMAXMEM appropriately high. You set it in units of MB, so for 4 GB you'd set

JAVAMAXMEM=4096 kitchen.sh ...

Upvotes: 0

Related Questions