John x
John x

Reputation: 4031

Display the imported table in HDFS using sqoop

i have set up a single node hadoop cluster and configured it to work with apache Hive, now when i imported a mysql table using the following command (using sqoop)

sqoop import --connect jdbc:mysql://localhost/dwhadoop --table orders --username root --password 123456 --hive-import

it runs successfully with some exceptions thrown after that when i do

Hive> show tales;

it does not list the orders table

if i run the import command again it gives me the error that orders dir already exists

please help me find the solution

EDIT:

i havent created any tables prior to the import, do i have to create a table order in hive before running the import. If i import another table Customers it gives me the following exception

[root@localhost root-647263876]# sqoop import --connect jdbc:mysql://localhost/dwhadoop  --table Customers --username root --password 123456 --hive-import
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: $HADOOP_HOME is deprecated.

12/08/05 07:30:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
12/08/05 07:30:25 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
12/08/05 07:30:25 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
12/08/05 07:30:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/08/05 07:30:26 INFO tool.CodeGenTool: Beginning code generation
12/08/05 07:30:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Customers` AS t LIMIT 1
12/08/05 07:30:26 INFO orm.CompilationManager: HADOOP_HOME is /home/enigma/hadoop/libexec/..
Note: /tmp/sqoop-root/compile/e48d4803894ee63079f7194792d624ed/Customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/08/05 07:30:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e48d4803894ee63079f7194792d624ed/Customers.jar
12/08/05 07:30:28 WARN manager.MySQLManager: It looks like you are importing from mysql.
12/08/05 07:30:28 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
12/08/05 07:30:28 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
12/08/05 07:30:28 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/08/05 07:30:28 INFO mapreduce.ImportJobBase: Beginning import of Customers
12/08/05 07:30:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/08/05 07:30:29 INFO mapred.JobClient: Running job: job_local_0001
12/08/05 07:30:29 INFO util.ProcessTree: setsid exited with exit code 0
12/08/05 07:30:29 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@11f41fd
12/08/05 07:30:29 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
12/08/05 07:30:30 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/08/05 07:30:30 INFO mapred.LocalJobRunner: 
12/08/05 07:30:30 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now
12/08/05 07:30:30 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to Customers
12/08/05 07:30:30 INFO mapred.JobClient:  map 0% reduce 0%
12/08/05 07:30:32 INFO mapred.LocalJobRunner: 
12/08/05 07:30:32 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/08/05 07:30:33 INFO mapred.JobClient:  map 100% reduce 0%
12/08/05 07:30:33 INFO mapred.JobClient: Job complete: job_local_0001
12/08/05 07:30:33 INFO mapred.JobClient: Counters: 13
12/08/05 07:30:33 INFO mapred.JobClient:   File Output Format Counters 
12/08/05 07:30:33 INFO mapred.JobClient:     Bytes Written=45
12/08/05 07:30:33 INFO mapred.JobClient:   File Input Format Counters 
12/08/05 07:30:33 INFO mapred.JobClient:     Bytes Read=0
12/08/05 07:30:33 INFO mapred.JobClient:   FileSystemCounters
12/08/05 07:30:33 INFO mapred.JobClient:     FILE_BYTES_READ=3205
12/08/05 07:30:33 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=52579
12/08/05 07:30:33 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=45
12/08/05 07:30:33 INFO mapred.JobClient:   Map-Reduce Framework
12/08/05 07:30:33 INFO mapred.JobClient:     Map input records=3
12/08/05 07:30:33 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
12/08/05 07:30:33 INFO mapred.JobClient:     Spilled Records=0
12/08/05 07:30:33 INFO mapred.JobClient:     Total committed heap usage (bytes)=21643264
12/08/05 07:30:33 INFO mapred.JobClient:     CPU time spent (ms)=0
12/08/05 07:30:33 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
12/08/05 07:30:33 INFO mapred.JobClient:     SPLIT_RAW_BYTES=87
12/08/05 07:30:33 INFO mapred.JobClient:     Map output records=3
12/08/05 07:30:33 INFO mapreduce.ImportJobBase: Transferred 45 bytes in 5.359 seconds (8.3971 bytes/sec)
12/08/05 07:30:33 INFO mapreduce.ImportJobBase: Retrieved 3 records.
12/08/05 07:30:33 INFO hive.HiveImport: Loading uploaded data into Hive
12/08/05 07:30:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Customers` AS t LIMIT 1
12/08/05 07:30:33 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Cannot run program "hive": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    at java.lang.Runtime.exec(Runtime.java:615)
    at java.lang.Runtime.exec(Runtime.java:526)
    at org.apache.sqoop.util.Executor.exec(Executor.java:76)
    at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:344)
    at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:297)
    at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:393)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:454)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
    at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
    at java.lang.ProcessImpl.start(ProcessImpl.java:130)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
    ... 15 more

but then again if i run the import again it says

Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: $HADOOP_HOME is deprecated.

12/08/05 07:33:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
12/08/05 07:33:48 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
12/08/05 07:33:48 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
12/08/05 07:33:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/08/05 07:33:48 INFO tool.CodeGenTool: Beginning code generation
12/08/05 07:33:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Customers` AS t LIMIT 1
12/08/05 07:33:49 INFO orm.CompilationManager: HADOOP_HOME is /home/enigma/hadoop/libexec/..
Note: /tmp/sqoop-root/compile/9855cf7de9cf54c59095fb4bfd65a369/Customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/08/05 07:33:50 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-root/compile/9855cf7de9cf54c59095fb4bfd65a369/Customers.java to /app/hadoop/tmp/mapred/staging/root-647263876/./Customers.java
java.io.IOException: Destination '/app/hadoop/tmp/mapred/staging/root-647263876/./Customers.java' already exists
    at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
    at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:368)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:454)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
    at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
12/08/05 07:33:50 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/9855cf7de9cf54c59095fb4bfd65a369/Customers.jar
12/08/05 07:33:51 WARN manager.MySQLManager: It looks like you are importing from mysql.
12/08/05 07:33:51 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
12/08/05 07:33:51 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
12/08/05 07:33:51 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/08/05 07:33:51 INFO mapreduce.ImportJobBase: Beginning import of Customers
12/08/05 07:33:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/08/05 07:33:52 INFO mapred.JobClient: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/root-195281052/.staging/job_local_0001
12/08/05 07:33:52 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory Customers already exists
12/08/05 07:33:52 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory Customers already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:119)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:179)
    at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
    at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:97)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:381)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:454)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
    at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

Upvotes: 0

Views: 4461

Answers (2)

grepIt
grepIt

Reputation: 116

  1. You have to include --create-hive table in your query if you want Sqoop to create a table and load the data into the table in hive
  2. When you import data into hive, Sqoop tries to create a temporary hdfs directory as staging inorder to load data to table finally. During that, its better make sure that the directory doesn't exists already.
  3. It looks like your working directory of sqoop doesn't have enough privileges to make filesytem changes. Make sure the 'user' is the owner of the related files.

Upvotes: 0

Carter Shanklin
Carter Shanklin

Reputation: 3047

The main thing to note is your original import fails because Sqoop tries to invoke hive but it's not in your path. Fix that problem before continuing.

Then you should just find and remove the Customers directory from hdfs (local, not in HDFS) and try again.

From what I've seen, errors of the form "Customers.java already exists" are not fatal.

Upvotes: 1

Related Questions