Tharunkumar Reddy
Tharunkumar Reddy

Reputation: 2813

Converting json data to normal rows in hive

I am very new to Hive and i am working on below problem but i couldn't solve it. Please help me.

I have below type of json records.

{"issues":[{"key":"COV-2073","labels":[ "java","db"]}]}

And i would like to un nest it or convert it like below.

Key          labels
"COV-2073"  "java","db"

I have used below kind of query but i am unable to get my desired output.

select v2.key  from demo_example as d lateral view json_tuple(d.a1,'issues') v1 as issue lateral view json_tuple(v1.issue,'key') v2 as key;

Even i am okay with below kind of output also.

 Key          labels
"COV-2073"    "java"
"COV-2073"    "db"

Please help me to solve above one.

Upvotes: 1

Views: 3144

Answers (2)

Kiran Krishna Innamuri
Kiran Krishna Innamuri

Reputation: 1002

You can use the cloudera's Hive JSON serde to parse JSON and load them as CSV. You can download the Serde from here. Add the jar into your Hive classPath or from the hive terminal.

Then you can create a hive table as

CREATE TABLE TEST(issues ARRAY<STRUCT<key:STRING,labels:STRING>>) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe';

Now load your data into the table and test. Now write your select statement as

select issues.key,issues.labels from TEST;

You will get your desired output as

["COV-2073"]    ["[java, db]"]

Upvotes: 1

Bala
Bala

Reputation: 11234

Here is an example in SparkSQL

//data.json
{"name":"John","age":"30","cars": [{ "name":"Ford", "models":["Fiesta", "Focus", "Mustang"]}, {"name":"BMW", "models":["320", "X3", "X5"]}, {"name":"Fiat", "models":["500", "Panda"]}]}

//SparkSQL
>>> sqlContext.sql("""select name,age,col1.name, col2 from json.`data.json` lateral view explode(cars) v1 as col1 lateral view explode(col1.models) v2 as col2""").show()

+----+---+----+-------+
|name|age|name|   col2|
+----+---+----+-------+
|John| 30|Ford| Fiesta|
|John| 30|Ford|  Focus|
|John| 30|Ford|Mustang|
|John| 30| BMW|    320|
|John| 30| BMW|     X3|
|John| 30| BMW|     X5|
|John| 30|Fiat|    500|
|John| 30|Fiat|  Panda|
+----+---+----+-------+

When some rows of json do not have values for specific columns and if you would like to show NULL then use lateral view outer instead of lateral view.

As an example, the below json has 2 entries, one with all details and one with no cars, models etc.

{"name":"John","age":"30","cars": [{ "name":"Ford", "models":["Fiesta", "Focus", "Mustang"]}, {"name":"BMW", "models":["320", "X3", "X5"]}, {"name":"Fiat", "models":["500", "Panda"]}]}
{"name":"Dough","age":"90"}

In this case, using outer produces null for entry Dough

>>> sqlContext.sql("""select name,age,col1.name, col2 from json.`data.json` lateral view outer explode(cars) v1 as col1 lateral view outer explode(col1.models) v2 as col2 order by col2""").show()

+-----+---+----+-------+
| name|age|name|   col2|
+-----+---+----+-------+
|Dough| 90|null|   null|
| John| 30| BMW|    320|
| John| 30|Fiat|    500|
| John| 30|Ford| Fiesta|
| John| 30|Ford|  Focus|
| John| 30|Ford|Mustang|
| John| 30|Fiat|  Panda|
| John| 30| BMW|     X3|
| John| 30| BMW|     X5|
+-----+---+----+-------+

If you want all your models as an array, then

>>> sqlContext.sql("""select name, age, car.name as car, car.models from json.`data.json` lateral view outer explode(cars) v1 as car""").show()
+-----+---+----+--------------------+
| name|age| car|              models|
+-----+---+----+--------------------+
| John| 30|Ford|[Fiesta, Focus, M...|
| John| 30| BMW|       [320, X3, X5]|
| John| 30|Fiat|        [500, Panda]|
|Dough| 90|null|                null|
+-----+---+----+--------------------+

Upvotes: 1

Related Questions