Reputation: 931
When I go to https://cloud.google.com/dataproc, I see this ...
"Dataproc is a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks."
But gcloud dataproc jobs submit
doesn't list all of them. It lists only 8 (hadoop, hive, pig, presto, pyspark, spark, spark-r, spark-sql). Any idea why?
~ gcloud dataproc jobs submit
ERROR: (gcloud.dataproc.jobs.submit) Command name argument expected.
Available commands for gcloud dataproc jobs submit:
hadoop Submit a Hadoop job to a cluster.
hive Submit a Hive job to a cluster.
pig Submit a Pig job to a cluster.
presto Submit a Presto job to a cluster.
pyspark Submit a PySpark job to a cluster.
spark Submit a Spark job to a cluster.
spark-r Submit a SparkR job to a cluster.
spark-sql Submit a Spark SQL job to a cluster.
For detailed information on this command and its flags, run:
gcloud dataproc jobs submit --help
Upvotes: 3
Views: 367
Reputation: 26478
Some OSS components are offered as Dataproc Optional Components. Not of all them have a job submit API, some (e.g., Anaconda, Jupyter) don't need one, some (e.g., Flink, Druid) might add in the future.
Some other OSS components are offered as libraries, e.g., GCS connector, BigQuery connector, Apache Parquet.
Upvotes: 2