Reputation: 111
I'm trying to minimize changes in my code so I'm wondering if there is a way to submit a spark-streaming job from my personal PC/VM as follows:
spark-submit --class path.to.your.Class --master yarn --deploy-mode client \
[options] <app jar> [app options]
without using GCP SDK.
I also have to specify a directory with configuration files HADOOP_CONF_DIR
which I was able to download from Ambari.
Is there a way to do the same?
Thank you
Upvotes: 2
Views: 931
Reputation: 1588
Setting up an external machine as a YARN client node is generally difficult to do and not a workflow that will work easily with Dataproc.
In a comment you mention that what you really want to do is
StreamingListener.onBatchCompleted
?).
Again, configuring a client node outside of the Dataproc cluster and getting it to work with spark-submit
is not going to work directly. However, if you can configure your network such that the Spark driver (running within Dataproc) has access to the service/script you need to run, and then invoke that when desired.
If you run your service on a VM that has access to the network of the Dataproc cluster, then your Spark driver should be able to access the service.
Upvotes: 1