Reputation: 2813
I am very new to Hive and i am working on below problem but i couldn't solve it. Please help me.
I have below type of json records.
{"issues":[{"key":"COV-2073","labels":[ "java","db"]}]}
And i would like to un nest it or convert it like below.
Key labels
"COV-2073" "java","db"
I have used below kind of query but i am unable to get my desired output.
select v2.key from demo_example as d lateral view json_tuple(d.a1,'issues') v1 as issue lateral view json_tuple(v1.issue,'key') v2 as key;
Even i am okay with below kind of output also.
Key labels
"COV-2073" "java"
"COV-2073" "db"
Please help me to solve above one.
Upvotes: 1
Views: 3144
Reputation: 1002
You can use the cloudera's Hive JSON serde to parse JSON and load them as CSV. You can download the Serde from here. Add the jar into your Hive classPath or from the hive terminal.
Then you can create a hive table as
CREATE TABLE TEST(issues ARRAY<STRUCT<key:STRING,labels:STRING>>) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe';
Now load your data into the table and test. Now write your select statement as
select issues.key,issues.labels from TEST;
You will get your desired output as
["COV-2073"] ["[java, db]"]
Upvotes: 1
Reputation: 11234
Here is an example in SparkSQL
//data.json
{"name":"John","age":"30","cars": [{ "name":"Ford", "models":["Fiesta", "Focus", "Mustang"]}, {"name":"BMW", "models":["320", "X3", "X5"]}, {"name":"Fiat", "models":["500", "Panda"]}]}
//SparkSQL
>>> sqlContext.sql("""select name,age,col1.name, col2 from json.`data.json` lateral view explode(cars) v1 as col1 lateral view explode(col1.models) v2 as col2""").show()
+----+---+----+-------+
|name|age|name| col2|
+----+---+----+-------+
|John| 30|Ford| Fiesta|
|John| 30|Ford| Focus|
|John| 30|Ford|Mustang|
|John| 30| BMW| 320|
|John| 30| BMW| X3|
|John| 30| BMW| X5|
|John| 30|Fiat| 500|
|John| 30|Fiat| Panda|
+----+---+----+-------+
When some rows of json do not have values for specific columns and if you would like to show NULL
then use lateral view outer
instead of lateral view
.
As an example, the below json has 2 entries, one with all details and one with no cars, models etc.
{"name":"John","age":"30","cars": [{ "name":"Ford", "models":["Fiesta", "Focus", "Mustang"]}, {"name":"BMW", "models":["320", "X3", "X5"]}, {"name":"Fiat", "models":["500", "Panda"]}]}
{"name":"Dough","age":"90"}
In this case, using outer produces null
for entry Dough
>>> sqlContext.sql("""select name,age,col1.name, col2 from json.`data.json` lateral view outer explode(cars) v1 as col1 lateral view outer explode(col1.models) v2 as col2 order by col2""").show()
+-----+---+----+-------+
| name|age|name| col2|
+-----+---+----+-------+
|Dough| 90|null| null|
| John| 30| BMW| 320|
| John| 30|Fiat| 500|
| John| 30|Ford| Fiesta|
| John| 30|Ford| Focus|
| John| 30|Ford|Mustang|
| John| 30|Fiat| Panda|
| John| 30| BMW| X3|
| John| 30| BMW| X5|
+-----+---+----+-------+
If you want all your models as an array, then
>>> sqlContext.sql("""select name, age, car.name as car, car.models from json.`data.json` lateral view outer explode(cars) v1 as car""").show()
+-----+---+----+--------------------+
| name|age| car| models|
+-----+---+----+--------------------+
| John| 30|Ford|[Fiesta, Focus, M...|
| John| 30| BMW| [320, X3, X5]|
| John| 30|Fiat| [500, Panda]|
|Dough| 90|null| null|
+-----+---+----+--------------------+
Upvotes: 1