divinedragon
divinedragon

Reputation: 5336

JSON output format for Hive Query results

Is there any way to convert the Hive query result in JSON format?

Upvotes: 22

Views: 27807

Answers (7)

big_ear_wrund
big_ear_wrund

Reputation: 1

'SELECT to_json' will work, but all the columns should be listed

Upvotes: 0

Jianwu Chen
Jianwu Chen

Reputation: 6023

Top answer works. However I'm a bit too lazy to explicitly convert to named_struct with all the column names. Also brickhouse UDF crashes when I use create table as select. To solve this issue, I developed a similar UDF which can support select to_json(*) from tbl;

Usage:

ADD JAR ivy://org.jsonex:HiveUDF:0.1.22?transitive=true;
CREATE TEMPORARY FUNCTION to_json AS 'org.jsonex.hiveudf.ToJsonUDF';
SELECT to_json(*) FROM tbl

Upvotes: 0

Reihan_amn
Reihan_amn

Reputation: 2727

list of json objects:

If you want to covent the output to json format and then making a collection out of it meaning: [{json1},{json2},{json3}]... you need to know that collect() function treat it as string which is wrong.

so what is needed for having a collection of json-formatted out put is wrapping them with from-json function too:

create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
create temporary function to_json as 'brickhouse.udf.json.ToJsonUDF';
create temporary function from_json as 'brickhouse.udf.json.FromJsonUDF';
collect(from_json(to_json(named_struct("locale", locale, "createdtime",  created_time)), 'map<string,string>')) as list_json_object,

Upvotes: 0

ForeverLearner
ForeverLearner

Reputation: 2103

I was using a tool called Apache Nifi. It has AvrotoJSON processor. The hive output which is in Avro format can be easily converted to JSON. The below link will be helpful: https://nifi.apache.org/

Upvotes: 1

Jerome Banks
Jerome Banks

Reputation: 1630

This seems to come up quite often. Use the to_json UDFs' from Brickhouse (http://github.com/klout/brickhouse ). If you convert your results to a named_struct, it will interpret it as a JSON map, and output accordingly.

SELECT to_json( named_struct( "field1", field1 ,
            "field2", field2,
            "field3", field3 ) )
   FROM mytable;

The to_json will also interpret arrays and maps accordingly.

Upvotes: 13

Gianmario Spacagna
Gianmario Spacagna

Reputation: 1300

Easiest way would be to first store as tsv and then use libraries like spray to convert to JSON.

Hive works fine as long as you don't try to customize too much. If you do have access to Spark in your cluster, use SparkSQL instead of Hive. The results of your query will be stored into a SchemaRDD and from there you can easily map to a JSon doing .map(_.toJson), assuming you have spray imported in your scope.

Upvotes: 0

Shengjie
Shengjie

Reputation: 12786

My experience will be use jackson library(http://jackson.codehaus.org/), you create a POJO to map the json format. So once you get the ResultSet from your hive query, you iterate through it and create objects of the POJO using Jackson.

/**--JACKSON Class--**/
public class Item {
    @JsonProperty
    private String att1;
    @JsonProperty
    private String att2;
    public Item(String att1, String att2){
        this.att1 = att1;
        this.att2 = att2;
    }

}

/**--the class where u run your query--**/
List<Item> list = new ArrayList<Item>();
ResultSet rs = executeQuery(queryStr); // do your hive query here
while(rs.next){
    String att1 = rs.get("att1");
    String att2 = rs.get("att2");
    Item item = new Item(att1, att2);
    list.add(item);
}

then you can return a List of Item as the result, Jackson allows you to write it in json format very easily.

  1. create a ObjectMapper ObjectMapper mapper = new ObjectMapper(); The mapper gives you a lot options to write the object in json format to different target. eg. outputstream, bytebuffer etc.

  2. iterate through the list.

  3. use mapper to write the value in json format, eg. mapper.writeValue(out, Item).In this example, out is a OutputStream.

Upvotes: 0

Related Questions