Reputation: 147
I have a table in Hive with multiple fields. For example
-------------------------------------------------------------
|Primary Key|Attribute 1|Attribute 2|Attribute 3|Attribute 4|
-------------------------------------------------------------
I need to query the table using HIVE SQL and return a JSON object embedded in a CSV format. For example:
Primary Key, Attribute 1, {"Primary Key":"", "Attribute 2":"", "Attribute 2":""}, Attribute 4
I don't have any post query programming language to convert the values to JSON format. I have no problem with straight SQL, I just don't know if I can get the JSON object out using sql.
Any ideas greatly appreciated.
Upvotes: 2
Views: 4081
Reputation: 507
I used brickhouse JAR (brickhouse-0.6.0.jar)
for JSON
output to achieve something like this. This is the best jar available, but if we need the keys in the JSON
to preserve camelcases
, then an extra elemennt need to be put in the to_json(named_struct())
clause.
You can read more about this JAR here. Below is the code snippet on how I did it.
ADD JAR path/to/jar/brickhouse-0.6.0.jar;
CREATE TEMPORARY FUNCTION to_json AS 'brickhouse.udf.json.ToJsonUDF';
CREATE TABLE IF NOT EXISTS V2
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ','
LINES TERMINATED BY '\n'
SELECT
empId,
to_json(named_struct('camel_case', 1, 'employee_id', employeeId, 'manager_id', msgId, 'org_Id', orgId), true) AS jsonString
FROM
employee
WHERE
employeeId=101
);
The output would be like this:
101, {"camelCase" : true, "employeeId" : 101, "managerId" : 201, "orgId" : 301}
I had to handle the case where I had to deal with Array of JSON
elements which needed to be in square brackets([]
). You can ignore outermost layer in the below query (i.e. SELECT FROM y
and just use the queries till SELECT FROM x
) if you are sure you have single JSON
per record. Query change for handling Array of JSON
.
CREATE TABLE IF NOT EXISTS V2
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ','
LINES TERMINATED BY '\n'
AS
SELECT
y.employeeId, CONCAT('[', y.jsonData, ']') AS jsonData
FROM
(
SELECT
x.employeeId, collect_list(jsonString) AS jsonData
FROM (
SELECT
empId,
to_json(named_struct('camel_case', 1, 'employee_id', employeeId, 'manager_id', msgId, 'org_Id', orgId), true) AS jsonString
FROM
employee
WHERE
employeeId=101
) x
GROUP BY
x.employeeId
) y;
The output would be like this:
101, [{"camelCase" : true, "employeeId" : 101, "managerId" : 201, "orgId" : 301}]
You can tweak this query to put the data in HDFS
directly using this version:
INSERT OVERWRITE DIRECTORY '/path/of/target/directory/'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ','
LINES TERMINATED BY '\n'AS
SELECT ...
Upvotes: 1