Reputation: 5336
Is there any way to convert the Hive query result in JSON format?
Upvotes: 22
Views: 27807
Reputation: 1
'SELECT to_json' will work, but all the columns should be listed
Upvotes: 0
Reputation: 6023
Top answer works. However I'm a bit too lazy to explicitly convert to named_struct
with all the column names. Also brickhouse
UDF crashes when I use create table as select
. To solve this issue, I developed a similar UDF which can support select to_json(*) from tbl
;
Usage:
ADD JAR ivy://org.jsonex:HiveUDF:0.1.22?transitive=true;
CREATE TEMPORARY FUNCTION to_json AS 'org.jsonex.hiveudf.ToJsonUDF';
SELECT to_json(*) FROM tbl
Upvotes: 0
Reputation: 2727
list of json objects:
If you want to covent the output to json format and then making a collection out of it meaning: [{json1},{json2},{json3}]... you need to know that collect() function treat it as string which is wrong.
so what is needed for having a collection of json-formatted out put is wrapping them with from-json function too:
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
create temporary function to_json as 'brickhouse.udf.json.ToJsonUDF';
create temporary function from_json as 'brickhouse.udf.json.FromJsonUDF';
collect(from_json(to_json(named_struct("locale", locale, "createdtime", created_time)), 'map<string,string>')) as list_json_object,
Upvotes: 0
Reputation: 2103
I was using a tool called Apache Nifi. It has AvrotoJSON processor. The hive output which is in Avro format can be easily converted to JSON. The below link will be helpful: https://nifi.apache.org/
Upvotes: 1
Reputation: 1630
This seems to come up quite often. Use the to_json UDFs' from Brickhouse (http://github.com/klout/brickhouse ). If you convert your results to a named_struct, it will interpret it as a JSON map, and output accordingly.
SELECT to_json( named_struct( "field1", field1 ,
"field2", field2,
"field3", field3 ) )
FROM mytable;
The to_json will also interpret arrays and maps accordingly.
Upvotes: 13
Reputation: 1300
Easiest way would be to first store as tsv and then use libraries like spray to convert to JSON.
Hive works fine as long as you don't try to customize too much. If you do have access to Spark in your cluster, use SparkSQL instead of Hive. The results of your query will be stored into a SchemaRDD and from there you can easily map to a JSon doing .map(_.toJson), assuming you have spray imported in your scope.
Upvotes: 0
Reputation: 12786
My experience will be use jackson library(http://jackson.codehaus.org/), you create a POJO to map the json format. So once you get the ResultSet from your hive query, you iterate through it and create objects of the POJO using Jackson.
/**--JACKSON Class--**/
public class Item {
@JsonProperty
private String att1;
@JsonProperty
private String att2;
public Item(String att1, String att2){
this.att1 = att1;
this.att2 = att2;
}
}
/**--the class where u run your query--**/
List<Item> list = new ArrayList<Item>();
ResultSet rs = executeQuery(queryStr); // do your hive query here
while(rs.next){
String att1 = rs.get("att1");
String att2 = rs.get("att2");
Item item = new Item(att1, att2);
list.add(item);
}
then you can return a List of Item as the result, Jackson allows you to write it in json format very easily.
create a ObjectMapper ObjectMapper mapper = new ObjectMapper(); The mapper gives you a lot options to write the object in json format to different target. eg. outputstream, bytebuffer etc.
iterate through the list.
use mapper to write the value in json format, eg. mapper.writeValue(out, Item).In this example, out is a OutputStream.
Upvotes: 0