Joey.Chang
Joey.Chang

Reputation: 164

hive can't create table with nested avro schema

I'm trying to use a nested avro schema to create a hive table. But it does not work. I'm using hive 1.1 in cdh5.7.2.

here is my nested avro schema:

[
    {
        "type": "record",
        "name": "Id",
        "namespace": "com.test.app_list",
        "doc": "Device ID",
        "fields": [
            {
                "name": "idType",
                "type": "int"
            },{
                "name": "id",
                "type": "string"
            }
        ]
    },

    {
        "type": "record",
        "name": "AppList",
        "namespace": "com.test.app_list",
        "doc": "",
        "fields": [
            {
                "name": "appId",
                "type": "string",
                "avro.java.string": "String"
            },
            {
                "name": "timestamp",
                "type":  "long"
            },

            {
                "name": "idList",
                "type": [{"type": "array", "items": "com.test.app_list.Id"}]
            }

        ]
    }
]

And my sql to create table:

CREATE EXTERNAL TABLE app_list
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='/hive/schema/test_app_list.avsc');

But hive gives me:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.avro.AvroSerdeException Schema for table must be of type RECORD. Received type: UNION)

hive doc shows: Supports arbitrarily nested schemas. from : https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Overview–WorkingwithAvrofromHive

data sample:

{
    "appId":{"string":"com.test.app"},
    "timestamp":{"long":1495893601606},
    "idList":{
        "array":[
            {"idType":15,"id":"6c:5c:14:c3:a5:39"},
            {"idType":13,"id":"eb297afe56ff340b6bb7de5c5ab09193"}
        ]
    }

}

But I don't know how to. I need some help to fix this. Thanks!

Upvotes: 2

Views: 2075

Answers (1)

hlagos
hlagos

Reputation: 7947

the top level of the your avro schema expect to be a Record Type, that is why Hive doesn't allow this. A workaround could be create the top level as Record and inside create two fields as record Type.

{ 
        "type": "record",
        "name": "myRecord",
        "namespace": "com.test.app_list"
          "fields": [
    {
        "type": "record",
        "name": "Id",
        "doc": "Device ID",
        "fields": [
            {
                "name": "idType",
                "type": "int"
            },{
                "name": "id",
                "type": "string"
            }
        ]
    },

    {
        "type": "record",
        "name": "AppList",
        "doc": "",
        "fields": [
            {
                "name": "appId",
                "type": "string",
                "avro.java.string": "String"
            },
            {
                "name": "timestamp",
                "type":  "long"
            },

            {
                "name": "idList",
                "type": [{"type": "array", "items": "com.test.app_list.Id"}]
            }

        ]
    }
    ]
}

Upvotes: 1

Related Questions