Some elementary doubts about running Mapreduce programs using mrjob on Amazon EMR

Question

I am new to mrjob and I am having problems to get the job running on Amazon EMR. I will write them in sequential order.

I can run a mrjob on my local machine. However when I have mrjob.conf in /home/ankit/.mrjob.conf and in /etc/mrjob.conf, the job is not executed on my local machine. Here is what I am getting. https://s3-ap-southeast-1.amazonaws.com/imagna.sample/local.txt
What is MRJOB_CONF in "the location specified by MR_CONF" in the documentation?
What is the use of 'base_tmp_directory' ? Also, do I need to upload the input data in S3 before starting the job or it will load from my local computer while starting the execution?
Do I need to do some bootstrapping if I use some libraries like numpy, scikit etc? If yes, how?
This is what I am getting when I execute the command for running a job on EMR https://s3-ap-southeast-1.amazonaws.com/imagna.sample/emr.txt

Any solutions?

Thanks a lot.

Answers (1)