Reputation: 2294
I want to create the external hive table from nested json data but fields should be flatten out from nested json.
For Example:-
{
"key1":"value1",
"key2":{
"nestedKey1":1,
"nestedKey2":2
}
}
Hive table should have format or fields flatten out like
key1: String, key2.nestedKey1:Int,key2.nestedKey1:Int
Thanks In Advance
Upvotes: 1
Views: 1996
Reputation: 31490
Use JsonSerDe
and create table with below syntax:
hive> create table sample(key1 string,key2 struct<nestedKey1:int,nestedKey2:int>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
hive> select key1,key2.nestedkey1,key2.nestedkey2 from sample;
+---------+-------------+-------------+--+
| key1 | nestedkey1 | nestedkey2 |
+---------+-------------+-------------+--+
| value1 | 1 | 2 |
+---------+-------------+-------------+--+
hive> select * from sample;
+--------------+----------------------------------+--+
| sample.key1 | sample.key2 |
+--------------+----------------------------------+--+
| value1 | {"nestedkey1":1,"nestedkey2":2} |
+--------------+----------------------------------+--+
(or)
If you want to create table with flatten out json fields
then use RegexSerDe
and matching regex to extract nestedkey from the data.
Refer this link for more details regards to regex serde.
UPDATE:
Inputdata:
{"key1":"value1","key2":{"nestedKey1":1,"nestedKey2":2}}
HiveTable:
hive> CREATE TABLE dd (key1 string, nestedKey1 string, nestedKey2 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES
('input.regex'=".*:\"(.*?)\",\"key2\":\\{\"nestedKey1\":(\\d),\"nestedKey2\":(\\d).*$");
Select data from the table:
hive> select * from dd;
+---------+-------------+-------------+--+
| key1 | nestedkey1 | nestedkey2 |
+---------+-------------+-------------+--+
| value1 | 1 | 2 |
+---------+-------------+-------------+--+
Upvotes: 1