Reputation: 113
I need to implement ExecuteScript in Nifi in order to do column transposition, and I am using pyspark as means to do that.
But the problem says "failed to process due to javax.script.ScriptExeption: ImportError: No module named pyspark in at line number 1:"
I set the path to spark and pyspark like this for module directory setting in ExecuteScript property.
C:\Users\username\Desktop\spark\spark-2.4.3-bin-hadoop2.7\hadoop,
C:\Users\username\Desktop\spark\spark-2.4.3-bin-hadoop2.7\bin\pyspark
But it did not work.
I am afraid this is very fundamental issue, could not figure out half a day..
Upvotes: 1
Views: 842
Reputation: 14194
This is likely because the pyspark
module is a natively-compiled Python module, and Apache NiFi uses Jython in the ExecuteScript
processor. This is a known issue, and the full explanation is here, as well as some work-arounds and details on options.
The simplest answer is to use ExecuteStreamCommand
and pass the necessary flowfile attributes as arguments, and the content as STDIN
. The output of the Python script will be returned via STDOUT
and captured as the new flowfile content.
Upvotes: 3