Can we use bucketing in hive table backed by avro schema

Question

I am trying to create one hive table backed by avro schema. Below is the DDL for that

CREATE TABLE avro_table
ROW FORMAT 
  SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'    
CLUSTERED BY (col_name) INTO N BUCKETS    
STORED AS 
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  
TBLPROPERTIES ( 'avro.schema.url' = 'hdfs://sandbox.hortonworks.com:8020/avroschema/test_schema.avsc')

But it is throwing below mentioned error

FAILED: ParseException line 3:3 missing EOF at 'clustered' near ''org.apache.hadoop.hive.serde2.avro.AvroSerDe''

I am not sure wheather we can use bucketing in Hive backed by AVRO or not

hive version--1.2

Can any one help me or provide any idea to achieve this .....

Tom Harrison · Accepted Answer

Your syntax is in the wrong order, and missing stuff. ROW FORMAT is defined after CLUSTERED BY, and CLUSTERED BY requires a column name which presumably needs to be defined as part of the CREATE TABLE command.

I assume the N in N BUCKETS is really replaced with your actual number of buckets, but if not, that's another error.

I have formatted the query in your question so that I could read it, and comparing to syntax here it was easier to spot what the parser didn't like.

Can we use bucketing in hive table backed by avro schema

Answers (1)

Related Questions