Amazon EMR: configuration of running on local file system

Question

I am currently trying to run a EMR job on local file system. For EMR, the local file system is on the EC2 instance the EMR job created. I followed this link: Is it possible to run Hadoop in Pseudo-Distributed operation without HDFS?

The configuration seems quite simple, set fs.default.name in core-site.xml to file:///. Then Hadoop will run on local file system instead of HDFS.

(I've first tried out this configuration with Hadoop on my local machine (Redhat). When set fs.default.name to file:/// doesn't work, but file://home// makes hadoop run smoothly.)

I change this value by adding bootstrap action when creating the jobflow.

./elastic-mapreduce --create --alive --subnet  --instance-type c3.2xlarge --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "-c,fs.default.name=file:///"

following this EMR document: Create Bootstrap Actions

The bootstrap action always succeeds, the logs say that it always successfully change this value in core-site.xml.

But hadoop always fail to launch after this bootstrap action giving me this error: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///

I've also tried fs.default.name=file://home/hadoop/: java.net.UnknownHostException: Invalid hostname for server: home

Or, fs.default.name=file:///home/hadoop/: java.lang.IllegalArgumentException

Or, fs.default.name=file://127.0.0.1/home/hadoop/: The namenode log file doesn't even give out a error message. It doesn't have SHUT_DOWN message as other errors. It just suddenly terminates.

Does EMR hadoop work on local file system at all? How do you configure it to do so?

Amazon EMR: configuration of running on local file system

Answers (1)

Related Questions