Suman Banerjee
Suman Banerjee

Reputation: 95

Not able to query records from Hive , when data stored as AVRO format , returns "error_error..." exception

We have followed the below steps ,

  1. imported a table from MySQL to HDFS location user/hive/warehouse/orders/, the table schema as

    mysql> describe orders;
    +-------------------+-------------+------+-----+---------+-------+
    | Field             | Type        | Null | Key | Default | Extra |
    +-------------------+-------------+------+-----+---------+-------+
    | order_id          | int(11)     | YES  |     | NULL    |       |
    | order_date        | varchar(30) | YES  |     | NULL    |       |
    | order_customer_id | int(11)     | YES  |     | NULL    |       |
    | order_items       | varchar(30) | YES  |     | NULL    |       |
    +-------------------+-------------+------+-----+---------+-------+
    
  2. Created an External Table in Hive using the same data from (1).

    CREATE EXTERNAL TABLE orders
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    LOCATION 'hdfs:///user/hive/warehouse/retail_stage.db/orders'
    TBLPROPERTIES ('avro.schema.url'='hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc');
    

    Sqoop Command :

     sqoop import \
      --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
      --username=root \
      --password=cloudera \
      --table orders \
      --target-dir /user/hive/warehouse/retail_stage.db/orders \
      --as-avrodatafile \
      --split-by order_id
    
  3. Describe formatted orders , returning error , tried many combination but failed.

    hive> describe orders;
    OK
    error_error_error_error_error_error_error   string                  from deserializer   
    cannot_determine_schema string                  from deserializer   
    check                   string                  from deserializer   
    schema                  string                  from deserializer   
    url                     string                  from deserializer   
    and                     string                  from deserializer   
    literal                 string                  from deserializer   
    Time taken: 1.15 seconds, Fetched: 7 row(s)
    

Same thing worked for --as-textfile , where as throwing error in case of --as-avrodatafile.

Referred some stack overflow but did not able to resolve. Any idea?

Upvotes: 0

Views: 761

Answers (1)

Sagar Shah
Sagar Shah

Reputation: 118

I think the reference to avro schema file in TBLPROPERTIES should be checked.

does following resolve?

hdfs dfs -cat hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc

I was able to create exact scenario and select from hive table.

hive> CREATE EXTERNAL TABLE sqoop_test
    > COMMENT "A table backed by Avro data with the Avro schema stored in HDFS"
    > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'    
    > STORED AS 
    >    INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'    
>    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/user/cloudera/categories/'    
> TBLPROPERTIES 
>  ('avro.schema.url'='hdfs:///user/cloudera/categories.avsc')
> ; 

OK Time taken: 1.471 seconds

 hive> select * from sqoop_test;
 OK
 1  2   Football
 2  2   Soccer
 3  2   Baseball & Softball

Upvotes: 0

Related Questions