Ravi Shastri
Ravi Shastri

Reputation: 57

Hive Json SerDE for ORC or RC Format

IS It possible to use a JSON serde with RC or ORC file formats? I am trying to insert into a Hive table with file format ORC and store on azure blob in serialized JSON.

Upvotes: 2

Views: 1889

Answers (2)

Pratik Khadloya
Pratik Khadloya

Reputation: 12879

You can do so using some sort of a conversion step, like a bucketing step which will produce ORC files in a target directory and mounting a hive table with same schema after bucketing. Like below.

CREATE EXTERNAL TABLE my_fact_orc
(
  mycol STRING,
  mystring INT
)
PARTITIONED BY (dt string)
CLUSTERED BY (some_id) INTO 64 BUCKETS
STORED AS ORC
LOCATION 's3://dev/my_fact_orc'
TBLPROPERTIES ('orc.compress'='SNAPPY');

ALTER TABLE my_fact_orc ADD IF NOT EXISTS PARTITION (dt='2017-09-07') LOCATION 's3://dev/my_fact_orc/dt=2017-09-07';

ALTER TABLE my_fact_orc PARTITION (dt='2017-09-07') SET FILEFORMAT ORC;

SELECT * FROM my_fact_orc WHERE dt='2017-09-07' LIMIT 5;

Upvotes: 0

David דודו Markovitz
David דודו Markovitz

Reputation: 44951

Apparently not

insert overwrite local directory '/home/cloudera/local/mytable' 
stored as orc 
select '{"mycol":123,"mystring","Hello"}'
;

create external table verify_data (rec string) 
stored as orc 
location 'file:////home/cloudera/local/mytable'
;

select * from verify_data
;

rec
{"mycol":123,"mystring","Hello"}

create external table mytable (myint int,mystring string)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' 
stored as orc
location 'file:///home/cloudera/local/mytable'
;

myint mystring
Failed with exception java.io.IOException:java.lang.ClassCastException:
org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.Text

JsonSerDe.java:

...
import org.apache.hadoop.io.Text;
...

  @Override
  public Object deserialize(Writable blob) throws SerDeException {

    Text t = (Text) blob;
  ...

Upvotes: 1

Related Questions