Reputation: 83
I am setting an environment variable in my bootstrap code
export HADOOP_HOME=/home/hadoop
export HADOOP_CMD=/home/hadoop/bin/hadoop
export HADOOP_STREAMING=/home/hadoop/contrib/streaming/hadoop_streaming.jar
export JAVA_HOME=/usr/lib64/jvm/java-7-oracle/
This is followed by usage of one of the variables defined above -
$HADOOP_CMD fs -mkdir /home/hadoop/contents
$HADOOP_CMD fs -put /home/hadoop/contents/* /home/hadoop/contents/
The execution fails with the error message -
/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 3: fs: command not found
/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 4: fs: command not found
cycle0.sh is the name of my bootstrap script.
Any comments as to what is happening here?
Upvotes: 4
Views: 4730
Reputation: 1828
You configure such Spark-specific (and other) environment variables with classifications, see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html
Another (rather dirty) option is to enrich bashrc
with some export FOO=bar
in the bootstrap action.
Upvotes: 0
Reputation: 1076
To get back to the topic of the question, it seems that environment variables can't be set from any bootstrap code, they can only be set or updated from a script that must be named
hadoop-user-env.sh
More details here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html
Upvotes: 1
Reputation: 952
I think you don't need the environment variable. just change
fs
to
hadoopfs
Upvotes: 0
Reputation: 83
I found a proper solution to my problem. My attempt to copy datafiles from S3 to EMR using hadoop fs
commands has been futile. I have just learned about S3DistCp
command available in EMR for file transfer so I am skipping the $HADOOP_CMD
method. For those who care how S3DistCp
works Link to AWS EMR Docs. I still do not understand why bootstrap script will not accept an environment variable in subsequent statements.
Upvotes: 1