Reputation: 19896
This is a follow-up question from
sqoop export local csv to MySQL error on mapreduce
I was able to run the sqoop job and got the data into MySQL from local .csv file using below command:
$ sqoop export -fs local -jt local -D 'mapreduce.application.framework.path=/usr/hdp/2.5.0.0-1245/hadoop/mapreduce.tar.gz' --connect jdbc:mysql://172.52.21.64:3306/cf_ae07c762_41a9_4b46_af6c_a29ecb050204 --username username --password password --table test3 --export-dir file:///home/username/folder/test3.csv
However, even when I got the records exported successfully after I checked in MySQL, I still saw the error ERROR tool.ExportTool: Error during export: Export job failed!
Full logs below:
17/04/10 09:36:28 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/04/10 09:36:28 INFO mapreduce.Job: Running job: job_local2136897360_0001
17/04/10 09:36:28 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/04/10 09:36:28 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter
17/04/10 09:36:28 INFO mapred.LocalJobRunner: Waiting for map tasks
17/04/10 09:36:28 INFO mapred.LocalJobRunner: Starting task: attempt_local2136897360_0001_m_000000_0
17/04/10 09:36:28 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/04/10 09:36:28 INFO mapred.MapTask: Processing split: Paths:/home/username/folder/test3.csv:36+7,/home/username/folder/test3.csv:43+8
17/04/10 09:36:28 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
17/04/10 09:36:28 INFO mapred.LocalJobRunner:
17/04/10 09:36:28 INFO mapred.Task: Task:attempt_local2136897360_0001_m_000000_0 is done. And is in the process of committing
17/04/10 09:36:28 INFO mapred.LocalJobRunner: map
17/04/10 09:36:28 INFO mapred.Task: Task 'attempt_local2136897360_0001_m_000000_0' done.
17/04/10 09:36:28 INFO mapred.LocalJobRunner: Finishing task: attempt_local2136897360_0001_m_000000_0
17/04/10 09:36:28 INFO mapred.LocalJobRunner: Starting task: attempt_local2136897360_0001_m_000001_0
17/04/10 09:36:28 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/04/10 09:36:28 INFO mapred.MapTask: Processing split: Paths:/home/username/folder/test3.csv:0+12
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper:
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: Exception raised during data export
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper:
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: Exception:
java.lang.RuntimeException: Can't parse input data: 'id'
at test3.__loadFromFields(test3.java:316)
at test3.parse(test3.java:254)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:89)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NumberFormatException: For input string: "id"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at test3.__loadFromFields(test3.java:303)
... 13 more
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: Dumping data is not allowed by default, please run the job with -Dorg.apache.sqoop.export.text.dump_data_on_error=true to get corrupted line.
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: On input file: file:/home/username/folder/test3.csv
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: At position 0
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper:
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: Currently processing split:
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: Paths:/home/username/folder/test3.csv:0+12
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper:
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: This issue might not necessarily be caused by current input
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper: due to the batching nature of export.
17/04/10 09:36:28 ERROR mapreduce.TextExportMapper:
17/04/10 09:36:28 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
17/04/10 09:36:28 INFO mapred.LocalJobRunner: Starting task: attempt_local2136897360_0001_m_000002_0
17/04/10 09:36:28 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/04/10 09:36:28 INFO mapred.MapTask: Processing split: Paths:/home/username/folder/test3.csv:12+12
17/04/10 09:36:28 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
17/04/10 09:36:28 INFO mapred.LocalJobRunner:
17/04/10 09:36:28 INFO mapred.Task: Task:attempt_local2136897360_0001_m_000002_0 is done. And is in the process of committing
17/04/10 09:36:28 INFO mapred.LocalJobRunner: map
17/04/10 09:36:28 INFO mapred.Task: Task 'attempt_local2136897360_0001_m_000002_0' done.
17/04/10 09:36:28 INFO mapred.LocalJobRunner: Finishing task: attempt_local2136897360_0001_m_000002_0
17/04/10 09:36:28 INFO mapred.LocalJobRunner: Starting task: attempt_local2136897360_0001_m_000003_0
17/04/10 09:36:28 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/04/10 09:36:28 INFO mapred.MapTask: Processing split: Paths:/home/username/folder/test3.csv:24+12
17/04/10 09:36:28 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
17/04/10 09:36:28 INFO mapred.LocalJobRunner:
17/04/10 09:36:28 INFO mapred.Task: Task:attempt_local2136897360_0001_m_000003_0 is done. And is in the process of committing
17/04/10 09:36:28 INFO mapred.LocalJobRunner: map
17/04/10 09:36:28 INFO mapred.Task: Task 'attempt_local2136897360_0001_m_000003_0' done.
17/04/10 09:36:28 INFO mapred.LocalJobRunner: Finishing task: attempt_local2136897360_0001_m_000003_0
17/04/10 09:36:28 INFO mapred.LocalJobRunner: map task executor complete.
17/04/10 09:36:28 WARN mapred.LocalJobRunner: job_local2136897360_0001
java.lang.Exception: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:122)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Can't parse input data: 'id'
at test3.__loadFromFields(test3.java:316)
at test3.parse(test3.java:254)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:89)
... 11 more
Caused by: java.lang.NumberFormatException: For input string: "id"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at test3.__loadFromFields(test3.java:303)
... 13 more
17/04/10 09:36:29 INFO mapreduce.Job: Job job_local2136897360_0001 running in uber mode : false
17/04/10 09:36:29 INFO mapreduce.Job: map 100% reduce 0%
17/04/10 09:36:29 INFO mapreduce.Job: Job job_local2136897360_0001 failed with state FAILED due to: NA
17/04/10 09:36:29 INFO mapreduce.Job: Counters: 15
File System Counters
FILE: Number of bytes read=673345391
FILE: Number of bytes written=679694703
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=388
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=2805989376
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
17/04/10 09:36:29 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 5.4541 seconds (0 bytes/sec)
17/04/10 09:36:29 INFO mapreduce.ExportJobBase: Exported 2 records.
17/04/10 09:36:29 ERROR mapreduce.ExportJobBase: Export job failed!
17/04/10 09:36:29 ERROR tool.ExportTool: Error during export: Export job failed!
Any idea or should I just ignore? I don't want to make a mistake and leave it as-is when running larger jobs and miss something.
UPDATE 1
Below is the .csv content without empty line or space
Here is the result after sqoop
and it was fine:
Upvotes: 1
Views: 2791
Reputation: 18270
The error is due to the CSV header in the file. Sqoop does not have any options to ignore the header while exporting data into MySQL. You would have to manually remove the header before performing sqoop-export
.
Any idea or should I just ignore?
Since this is only one line, the mapper processing the split containing the header would throw exceptions but they are not potential enough to KILL the job. Between who likes to see an exception in the Job execution log.
Upvotes: 1