kpeng
kpeng

Reputation: 81

Creating hive table over complex parquet file

I am trying to put a hive table on top of a parquet table that I created based of the following json contents:
{"user_id":"4513","providers":[{"id":"4220","name":"dbmvl","behaviors":{"b1":"gxybq","b2":"ntfmx"}},{"id":"4173","name":"dvjke","behaviors":{"b1":"sizow","b2":"knuuc"}}]}

{"user_id":"3960","providers":[{"id":"1859","name":"ponsv","behaviors":{"b1":"ahfgc","b2":"txpea"}},{"id":"103","name":"uhqqo","behaviors":{"b1":"lktyo","b2":"ituxy"}}]}

{"user_id":"567","providers":[{"id":"9622","name":"crjju","behaviors":{"b1":"rhaqc","b2":"npnot"}},{"id":"6965","name":"fnheh","behaviors":{"b1":"eipse","b2":"nvxqk"}}]}

I basically used spark sql to read the json and write out a parquet file.

I am running into issues with putting hive on top of the produced parquet file. Here is the hive hql I have:
create table test (mycol STRUCT<user_id:String, providers:ARRAY<STRUCT<id:String, name:String, behaviors:MAP<String, String>>>>) stored as parquet; Alter table test set location 'hdfs:///tmp/test.parquet'; The above statements execute fine, but I get errors when I try to do a select * on the table:
Failed with exception java.io.IOException:java.lang.IllegalStateException: Column mycol at index 0 does not exist in {providers=providers, user_id=user_id}

Upvotes: 0

Views: 1776

Answers (1)

Dan Osipov
Dan Osipov

Reputation: 1431

Try changing your query to:

create table test (user_id:String, providers:ARRAY<STRUCT<id:String, name:String, behaviors:MAP<String, String>>>) stored as parquet;

The root JSON object gets flattened out when Parquet file is stored.

Upvotes: 1

Related Questions