akash sharma
akash sharma

Reputation: 461

Oozie shell action - running sqoop command and need logging details

I am working with Oozie, shell action and sqoop.

I am using oozie to run many sqoop commands. I have set up a shell action and in that shell I have placed many sqoop commands. Now, shell action is getting triggered and even sqoop is happening.

However, there is no proper logging for sqoop. So I had sqoop command being redirected to the log file. There I am seeing only the following lines.

My code is as follows. Inside the shell script:

*
sqoop import  --connect jdbc:mysql://server:3306/test --verbose  --username root --password Password    --append --table People  --m 1 --hive-drop-import-delims  --target-dir  /user/username/20/  --delete-target-dir  >> /tmp/log
*

Log details captured:

Warning: /opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation.

How can I get sqoop logs using shell action in Oozie? I need details like how many records loaded/pulled, and usual logging which happens when we run sqoop command on console.

Please find below my jobproperties.xml:

*oozie.use.system.libpath=True
credentials={u'hcat': {'xml_name': u'hcat', 'properties': [('hcat.metastore.uri', u'thrift://node:9083'), ('hcat.metastore.principal', u'hive/[email protected]')]}, u'hive2': {'xml_name': u'hive2', 'properties': [('hive2.jdbc.url', 'jdbc:hive2://node.jnj.com:10000/default'), ('hive2.server.principal', 'hive/[email protected]')]}, u'hbase': {'xml_name': u'hbase', 'properties': []}}
nameNode=hdfs://nameservice1
jobTracker=yarnRM
oozie.sqoop.log.level=DEBUG
log4jConfig=debug-log.properties
 oozie.libpath=/user/oozie/share/lib*

Upvotes: 2

Views: 1900

Answers (1)

Samson Scharfrichter
Samson Scharfrichter

Reputation: 9067

You have a parameter hinting that Log4J should use a specific properties file, but Sqoop is not instructed to use it (no -Dlog4j.configuration=...file name without path... on command line).

OK, let's assume that is is done on purpose (?); the problem is that

  • Log4J searchs for a properties file w/ default name i.e. log4j.properties
  • search is done in the directories present in CLASSPATH, stopping on first match
  • default CLASSPATH for an Oozie shell is Hadoop JARs, then Hadoop conf dir, then current working dir (the place where all <file> dependencies are dumped along with Oozie JARs) and all these app/Oozie JARs
  • the first match for log4j.properties happens to be a file that Log4J cannot open
  • thus Log4J does not log anything anywhere (????????)

A possible workaround would be

  1. create a custom log4j.properties -- cf. 1st example in that post to log anything tagged INFO and above (i.e. INFO, WARN, ERROR but not DEBUG) to StdOut
  2. upload that file to HDFS somewhere, then tell the Oozie Action to download it to the container with a <file> element
  3. tell the Oozie Action to request that its CLASSPATH starts with current working dir, by setting a property such as oozie.launcher.mapreduce.task.classpath.first to true (actual property may depend on your Hadoop version, see that post and that JIRA)

Note that step 3 is only necessary because of the default name being present elsewhere in the CLASSPATH; if Sqoop was instructed to use a different file with a different name then there would be no ambiguity.

Upvotes: 2

Related Questions