Reputation: 1601
I am new to using cloud services and navigating Google's Cloud Platform is quite intimidating. When it comes to Google Dataproc, they do advertise Hadoop, Spark and Hive.
My question is, is Impala available at all?
I would like to do some benchmarking projects using all four of these tools and I require Apache Impala along side Spark/Hive.
Upvotes: 2
Views: 1632
Reputation: 177
Cloud Dataproc supports Hadoop, Spark, Hive, Pig by default on the cluster. You can install more optionally supported components such as Zookeeper, Jyputer, Anaconda, Kerberos, Druid and Presto (You can find the complete list here). In addition, you can install a large set of open source components using initialization-actions.
Impala is not supported as optional component and there is no initialization-action script for it yet. You could get it to work on Dataproc with HDFS but making it work with GCS may require non-trivial changes.
Upvotes: 0
Reputation: 2099
Dataproc provides you SSH access to the master and workers, so it is possible to install additional software and according to Impala documentation you would need to:
Remember that it is recommended to install the impalad daemon with each DataNode.
Upvotes: 1
Reputation: 1743
You can try also using another new instance of Dataproc, instead of using the default.
For example, you can create a Dataproc instance with HUE (Hadoop User Experience)
which is an interface to handle Hadoop cluster built by Cloudera. The advantage here is that HUE has as a default component Apache Impala. It also has Pig, Hive, etc. So it's a pretty good solution for using Impala.
Another solution will be to create your own cluster by the beginning but is not a good idea (at least you want to customize everything). With this way, you can install Impala.
Here is a link, for more information:
https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/hue
Upvotes: 2
Reputation: 966
No, DataProc is a cluster that supports Hadoop, Spark, Hive and pig; using default images.
Check this link for more information about native image list for DataProc
https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions
Upvotes: 2