mitroberts
mitroberts

Reputation: 223

nifi pyspark - "no module named boto3"

I'm trying to run a pyspark job I created that downloads and uploads data from s3 using the boto3 library. While the job runs fine in pycharm, when I try to run it in nifi using this template https://github.com/Teradata/kylo/blob/master/samples/templates/nifi-1.0/template-starter-pyspark.xml

The ExecutePySpark errors with "No module named boto3".

I made sure it was installed on my conda environment that is active.

Any ideas, im sure im missing something obvious.

Here is a picture of the nifi spark processor.

enter image description here

Thanks, tim

Upvotes: 1

Views: 1319

Answers (1)

Sivaprasanna Sethuraman
Sivaprasanna Sethuraman

Reputation: 4132

The Python environment where PySpark should run on is configured via the PYSPARK_PYTHON variable.

  • Go to Spark installation directory
  • Go to conf
  • Edit spark-env.sh
  • Add this line: export PYSPARK_PYTHON=PATH_TO_YOUR_CONDA_ENV

Upvotes: 2

Related Questions