Reputation: 95
I have tried running below commands in Sqoop2:
This one works wherein TAB-Separated part files (part-m-00000, part-m-00001 etc) were created:
sqoop import --connect jdbc:oracle:thin:@999.999.999.999:1521/SIDNAME --username god --table TABLENAME --fields-terminated-by '\t' --lines-terminated-by '\n' -P
This one fails:
sqoop import -Dmapreduce.job.user.classpath.first=true \
-Dmapreduce.output.basename=`date +%Y-%m-%d` \
--connect jdbc:oracle:thin:@999.999.999.999:1521/SIDNAME \
--username nbkeplo \
--P \
--table TABLENAME \
--columns "COL1, COL2, COL3" \
--target-dir /usr/data/sqoop \
-–as-parquetfile \
-m 10
Error:
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: -–as-parquetfile
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: -m
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: 10
Try --help for usage instructions.
I want the output to be a <.parquet> file and not a HIVE table (want to use with Apache Spark directly without using HIVE). Is this <.parquet> file creation possible with Sqoop import ?
Upvotes: 0
Views: 149
Reputation: 95
The below works:
sqoop import \
--connect jdbc:oracle:thin:@999.999.999.999:1521/SIDNAME \
--username user \
--target-dir /xxx/yyy/zzz \
--as-parquetfile \
--table TABLE1 \
-P
Upvotes: 0
Reputation: 600
Importing directly to HDFS (as AVRO, SequenceFile, or ) is possible with Sqoop. When you output to Hive, it's still written to HDFS, just inside the Hive warehouse for managed tables. Also, Spark is able to read from any HDFS location it has permission to.
Your code snippets are not the same, and you didn't mention troubleshooting steps you have tried.
I would add the --split-by
, --fields-terminated-by
, and --lines-terminated-by
arguments to your command.
Upvotes: 0