Reputation: 664
I am trying to run "pip install py4j" in Native mode of Python Evaluator. I can't find the place where I can run this command to install the dependency. Unable to find the solution anywhere over the web. Please guide me to execute this command in data fusion.
Thanks in advance!
Upvotes: 1
Views: 992
Reputation: 1
Yes, Tlaquetzal is right, basically, you have two ways to achieve this.
Use the fixed cluster and set up the Remote Hadoop Provisioner in CDAP
Create a custom image with the library.
#!/bin/bash
apt-get update
apt -y --force-yes install python3.7
apt -y --force-yes install python3-pip
pip3 install py4j
Upvotes: 0
Reputation: 2850
There's no straightforward approach for this, because you cannot modify the Dataproc cluster used in the execution within the pipeline. So, if you really need to use the Python plug-in in Native mode, my suggestion is to create a cluster with the py4j library, and then connect it to Data Fusion using the "Remote Hadoop provisioner".
Consider that to use this provisioner, you'll need to create a new Compute Profile, which is only available in Data Fusion Enterprise version.
To install the py4j library in your cluster, you can either create a custom image with the library, provide an initialization actions script to install it, or SSH into the machines and manually execute the pip install command.
Upvotes: 1