Reputation: 95
We have followed the below steps ,
imported a table from MySQL to HDFS location user/hive/warehouse/orders/
, the table schema as
mysql> describe orders;
+-------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+-------+
| order_id | int(11) | YES | | NULL | |
| order_date | varchar(30) | YES | | NULL | |
| order_customer_id | int(11) | YES | | NULL | |
| order_items | varchar(30) | YES | | NULL | |
+-------------------+-------------+------+-----+---------+-------+
Created an External Table in Hive using the same data from (1).
CREATE EXTERNAL TABLE orders
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES ('avro.schema.url'='hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc');
Sqoop Command :
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=root \
--password=cloudera \
--table orders \
--target-dir /user/hive/warehouse/retail_stage.db/orders \
--as-avrodatafile \
--split-by order_id
Describe formatted orders , returning error , tried many combination but failed.
hive> describe orders;
OK
error_error_error_error_error_error_error string from deserializer
cannot_determine_schema string from deserializer
check string from deserializer
schema string from deserializer
url string from deserializer
and string from deserializer
literal string from deserializer
Time taken: 1.15 seconds, Fetched: 7 row(s)
Same thing worked for --as-textfile
, where as throwing error in case of --as-avrodatafile
.
Referred some stack overflow but did not able to resolve. Any idea?
Upvotes: 0
Views: 761
Reputation: 118
I think the reference to avro schema file in TBLPROPERTIES should be checked.
does following resolve?
hdfs dfs -cat hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc
I was able to create exact scenario and select from hive table.
hive> CREATE EXTERNAL TABLE sqoop_test
> COMMENT "A table backed by Avro data with the Avro schema stored in HDFS"
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/user/cloudera/categories/'
> TBLPROPERTIES
> ('avro.schema.url'='hdfs:///user/cloudera/categories.avsc')
> ;
OK Time taken: 1.471 seconds
hive> select * from sqoop_test;
OK
1 2 Football
2 2 Soccer
3 2 Baseball & Softball
Upvotes: 0