Reputation:
I am currently trying to run a EMR job on local file system. For EMR, the local file system is on the EC2 instance the EMR job created. I followed this link: Is it possible to run Hadoop in Pseudo-Distributed operation without HDFS?
The configuration seems quite simple, set fs.default.name
in core-site.xml to file:///
. Then Hadoop will run on local file system instead of HDFS.
(I've first tried out this configuration with Hadoop on my local machine (Redhat). When set fs.default.name
to file:///
doesn't work, but file://home/<username>/
makes hadoop run smoothly.)
I change this value by adding bootstrap action when creating the jobflow.
./elastic-mapreduce --create --alive --subnet <subnet-id> --instance-type c3.2xlarge --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "-c,fs.default.name=file:///"
following this EMR document: Create Bootstrap Actions
The bootstrap action always succeeds, the logs say that it always successfully change this value in core-site.xml.
But hadoop always fail to launch after this bootstrap action giving me this error: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///
I've also tried fs.default.name=file://home/hadoop/
:
java.net.UnknownHostException: Invalid hostname for server: home
Or, fs.default.name=file:///home/hadoop/
:
java.lang.IllegalArgumentException
Or, fs.default.name=file://127.0.0.1/home/hadoop/
:
The namenode log file doesn't even give out a error message. It doesn't have SHUT_DOWN message as other errors. It just suddenly terminates.
Does EMR hadoop work on local file system at all? How do you configure it to do so?
Upvotes: 0
Views: 1777