Reputation: 141
I want to create a Apache Flink standalone cluster with serveral taskmanagers. I would like to use HDFS and Hive. Therefore i have to add some Hadoop dependencies.
After reading the documentation, the recommended way is to set the HADOOP_CLASSPATH env variable. But how do i have to add the hadoop files? Should i download the source files in some directory like /opt/hadoop ont the taskmanagers and set the variable to this path?
I only know the old but deprecated way downloading a Uber-Jar with the dependencies and place it under the /lib folder.
Upvotes: 0
Views: 1070
Reputation: 9245
Normally you'd do the standard Hadoop installation, since you (for HDFS) need Node Managers running on every server (with appropriate configuration), plus the NameNode running on your master
server.
So then you can do something like this on the master
server where you're submitting your Flink workflow:
export HADOOP_CLASSPATH=`hadoop classpath`
export HADOOP_CONF_DIR=/etc/hadoop/conf
Upvotes: 0