Diego Serrano
Diego Serrano

Reputation: 1036

Pentaho Kettle Process won't finish in Ubuntu

I am running Pentaho Data Integration on an Ubuntu server. I have multiple jobs that run at different times using Jenkins as the orchestrator. I have noticed that sometimes a PDI Job never ends (I get no exit code and the logs freeze in the middle of the process with no exceptions), and when I check the server's memory, it's fully allocated. This raises the following questions:

  1. Shouldn't Pentaho throw an OutOfMemory Java exception when the server's memory is fully allocated?
  2. Why is the Pentago biserver-ee being launched if I have no process that runs the server? I only run Jobs using kitchen.sh.
  3. Why do I always have a persistent Pentaho process running (see process #2 in the image)? It must be the Pentaho process as the Java parameters are the same as my spoon.sh config, but should it be persistent if all the jobs finished?
  4. Does Spoon/kettle/PDI/Pentago starts a persistent process to allocate the memory specified with the Xms parameter?
  5. Why is my Pentaho persistent process using 1 core at it's 100% all the time?

It makes no sense as there is nothing currently running. I want to know how I can identify the issues as the logs stop printing the results, so I have no clue on where to start to solve this problem.

I attach an image of the three processes that are consuming memory on my server (Jenkins, Pentaho BI Server and Spoon) and the specs of my server, Java and Pentaho setup.

Server Specs (It's a Virtual Machine created using VmWare):

  1. OS: Ubuntu 14.04.4
  2. RAM: 12GB
  3. Cores: 4

My Java version is "1.8.0_101"

I changed the memory parameters in spoon.sh as follows:

  1. Xms: 1024m
  2. Xmx: 7GB
  3. XX:MaxPermSize=2GB

enter image description here

Upvotes: 1

Views: 1216

Answers (1)

nsousa
nsousa

Reputation: 4544

How did you install the Pentaho tools? If you installed the trial version of Pentaho Enterprise Edition, then it'll install and set up the server as well as PDI client (which includes kitchen).

If then you run the ctlscript.sh start script, it'll start the pentaho server, the repository database (postgres by default) and all that goes with it.

If you are only running things through kitchen and don't want to use the Pentaho repository then you can stop the Pentaho server alltogether and launch PDI jobs and transformations from the filesystem.

As for the OOO error: yes, it should throw it. It sometimes happens that PDI stops abruptly and doesn't throw any errors, but most of the times you'll see the OOO message in the logs and the failure is properly caught.

Upvotes: 1

Related Questions