crystyxn
crystyxn

Reputation: 1601

Does Google Dataproc support Apache Impala?

I am new to using cloud services and navigating Google's Cloud Platform is quite intimidating. When it comes to Google Dataproc, they do advertise Hadoop, Spark and Hive.

My question is, is Impala available at all?

I would like to do some benchmarking projects using all four of these tools and I require Apache Impala along side Spark/Hive.

Upvotes: 2

Views: 1632

Answers (4)

Aniket Mokashi
Aniket Mokashi

Reputation: 177

Cloud Dataproc supports Hadoop, Spark, Hive, Pig by default on the cluster. You can install more optionally supported components such as Zookeeper, Jyputer, Anaconda, Kerberos, Druid and Presto (You can find the complete list here). In addition, you can install a large set of open source components using initialization-actions.

Impala is not supported as optional component and there is no initialization-action script for it yet. You could get it to work on Dataproc with HDFS but making it work with GCS may require non-trivial changes.

Upvotes: 0

rsantiago
rsantiago

Reputation: 2099

Dataproc provides you SSH access to the master and workers, so it is possible to install additional software and according to Impala documentation you would need to:

Remember that it is recommended to install the impalad daemon with each DataNode.

Upvotes: 1

Kenry Sanchez
Kenry Sanchez

Reputation: 1743

You can try also using another new instance of Dataproc, instead of using the default.

For example, you can create a Dataproc instance with HUE (Hadoop User Experience) which is an interface to handle Hadoop cluster built by Cloudera. The advantage here is that HUE has as a default component Apache Impala. It also has Pig, Hive, etc. So it's a pretty good solution for using Impala.

Another solution will be to create your own cluster by the beginning but is not a good idea (at least you want to customize everything). With this way, you can install Impala.

Here is a link, for more information:

https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/hue

Upvotes: 2

Chaotic Pechan
Chaotic Pechan

Reputation: 966

No, DataProc is a cluster that supports Hadoop, Spark, Hive and pig; using default images.

Check this link for more information about native image list for DataProc

https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions

Upvotes: 2

Related Questions