user3797438
user3797438

Reputation: 485

To debug map reduce jobs in eclipse

I want to debug Map-reduce jobs (pig,hive) using eclipse. That is, to set break points in the hadoop source java file and to inspect the elements while running map-reduce jobs. To do this, I started all the services using eclipse and I can debug some class files. But I cant create an entire debug environment. Can anyone tell me how?

Upvotes: 1

Views: 1360

Answers (3)

Jonathan L
Jonathan L

Reputation: 10698

I created an eclipse project to debug generic mapreduce program, for example WordCount.java, running standalone hadoop in Eclipse. But I did not try hive/pig specific mapreduce jobs yet. The project locates at https://github.com/drachenrio/hadoopmr, it can be download using

git clone https://github.com/drachenrio/hadoopmr

This project was created with Ubuntu 16.04.2, Eclipse Neon.3 Release (4.6.3RC2), jdk1.8.0_121, hadoop-2.7.3 environments.

Quick setup:
1) Once project imported into Eclipse, open .classpath,
    replace /j01/srv/hadoop-2.7.3 with your hadoop installation home path
2) mkdir -p /home/hadroop/input
    copy src/main/resources/input.txt to /home/hadoop/input/

It is ready to run/debug WordCount.java mapreduce job.
Read README.md for more details.
If you prefer to manually create the project, see my another answer in stackoverflow

Upvotes: 0

Manish Verma
Manish Verma

Reputation: 781

The basic thing to remember here is that debugging a Hadoop MR job is going to be similar to any remotely debugged application in Eclipse.

As you would know, Hadoop can be run in the local environment in 3 different modes :

  1. Local Mode
  2. Pseudo Distributed Mode
  3. Fully Distributed Mode (Cluster)

Typically you will be running your local hadoop setup in Pseudo Distributed Mode to leverage HDFS and Map Reduce(MR). However you cannot debug MR programs in this mode as each Map/Reduce task will be running in a separate JVM process so you need to switch back to Local mode where you can run your MR programs in a single JVM process.

Here are the quick and simple steps to debug this in your local environment:

  1. Run hadoop in local mode for debugging so mapper and reducer tasks run in a single JVM instead of separate JVMs. Below steps help you do it.

  2. Configure HADOOP_OPTS to enable debugging so when you run your Hadoop job, it will be waiting for the debugger to connect. Below is the command to debug the same at port 8080.

(export HADOOP_OPTS=”-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8008“)

  1. Configure fs.default.name value in core-site.xml to file:/// from hdfs://. You won’t be using hdfs in local mode.

  2. Configure mapred.job.tracker value in mapred-site.xml to local. This will instruct Hadoop to run MR tasks in a single JVM.

  3. Create debug configuration for Eclipse and set the port to 8008 – typical stuff. For that go to the debugger configurations and create a new Remote Java Application type of configuration and set the port as 8080 in the settings.

  4. Run your hadoop job (it will be waiting for the debugger to connect) and then launch Eclipse in debug mode with the above configuration. Do make sure to put a break-point first.

Thats it.

Upvotes: 1

SparkleGoat
SparkleGoat

Reputation: 513

I do not know of an eclipse tool that can do what you are looking for. If you are looking for a possible solution the following will work in java.

import java.util.logging.Logger;

For debugging java map reduce files you can use the java logger for each class( driver, mapper, reducer ).

Logger log = Logger.getLogger(MyClass.class.getName());

To inspect elements/variables just use :

log.info( "varOne: " + varOne );

These log lines with be printed in the administration page for your job.

Upvotes: 1

Related Questions