Reputation: 3253
I'm using below command to launch a cluster.
./elastic-mapreduce --create \
--stream \
--cache s3n://bucket_name/code/totalInstallUsers#totalInstallUsers \
--input s3n://bucket_name/input \
--output s3n://bucket_name/output \
--mapper s3n://bucket_name/code/mapper.py \
--reducer s3n://bucket_name \
--jobflow-role EMR_EC2_DefaultRole \
--service-role EMR_DefaultRole \
--debug \
--log-uri s3n://bucket_name/logs
and I always got below error message. If I remove the --cache statement, the cluster will be launched successfully.
Error: undefined method each' for #<String:0x00000002c28ba0>
/home/ubuntu/data_processing/commands.rb:806:in
steps'
/home/ubuntu/data_processing/commands.rb:1232:in block in enact'
/home/ubuntu/data_processing/commands.rb:1232:in
map'
/home/ubuntu/data_processing/commands.rb:1232:in enact'
/home/ubuntu/data_processing/commands.rb:49:in
block in enact'
/home/ubuntu/data_processing/commands.rb:49:in each'
/home/ubuntu/data_processing/commands.rb:49:in
enact'
/home/ubuntu/data_processing/commands.rb:2422:in create_and_execute_commands'
/home/ubuntu/data_processing/elastic-mapreduce-cli.rb:13:in
'
/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in require'
/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in
require'
./elastic-mapreduce:6:in `'
Reason to use --cache is that I wish from mapper.py I can open the datafile via "with open('./totalInstallUsers', 'r') as infile:
could anyone give me a clue? thanks
Upvotes: 1
Views: 293
Reputation: 3253
Here to post the solution I got, hopefully helpful for others. Using AWS EMR, the command look like:
aws emr create-cluster
--name "cluster--name"
--enable-debugging
--log-uri s3://bucket-name/logs
--ami-version 3.7.0
--use-default-roles
--ec2-attributes KeyName=your-key
--instance-type m3.xlarge
--instance-count 3
--auto-terminate
--steps file://./streaming.json
And in Streaming.json, it looks like:
[
{
"Type": "STREAMING",
"Name": "Streaming program",
"ActionOnFailure": "TERMINATE_CLUSTER",
"Args": [
"-files","s3://bucket-name/code/mapper.py,s3://bucket-name/code/reducer.py",
"-mapper","mapper.py",
"-reducer","reducer.py",
"-input","s3://bucket-name/input",
"-output","s3://bucket-name/output",
"-cacheFile", "s3://bucket_name/code/data-file-name#new-file-name"
]
}
]
Upvotes: 1