J.Dan
J.Dan

Reputation: 83

Spark HiveContext get the same format as hive client select

When a Hive table has values like maps or arrays, if you select it in the Hive client they are shown as JSON, e.g.: {"a":1,"b":1} or [1,2,2].

When you select those in Spark, they are map/array objects in the DataFrame. If you stringify each row they are Map("a" -> 1, "b" -> 1) or WrappedArray(1, 2, 2).

I want to have the same format as the Hive client when using Spark's HiveContext.

How can I do this?

Upvotes: 0

Views: 57

Answers (1)

stefanobaghino
stefanobaghino

Reputation: 12794

Spark has its own functions to convert complex objects into their JSON representation.

Here is the documentation for the org.apache.spark.sql.functions package, which also comes with the to_json function that does the following:

Converts a column containing a StructType, ArrayType of StructTypes, a MapType or ArrayType of MapTypes into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.

Here is a short example as ran on the spark-shell:

scala> val df = spark.createDataFrame(
     |   Seq(("hello", Map("a" -> 1)), ("world", Map("b" -> 2)))
     | ).toDF("name", "map")
df: org.apache.spark.sql.DataFrame = [name: string, map: map<string,int>]

scala> df.show
+-----+-----------+
| name|        map|
+-----+-----------+
|hello|Map(a -> 1)|
|world|Map(b -> 2)|
+-----+-----------+

scala> df.select($"name", to_json(struct($"map")) as "json").show
+-----+---------------+
| name|           json|
+-----+---------------+
|hello|{"map":{"a":1}}|
|world|{"map":{"b":2}}|
+-----+---------------+

Here is a similar example, with arrays instead of maps:

scala> val df = spark.createDataFrame(
     |   Seq(("hello", Seq("a", "b")), ("world", Seq("c", "d")))
     | ).toDF("name", "array")
df: org.apache.spark.sql.DataFrame = [name: string, array: array<string>]

scala> df.select($"name", to_json(struct($"array")) as "json").show
+-----+-------------------+
| name|               json|
+-----+-------------------+
|hello|{"array":["a","b"]}|
|world|{"array":["c","d"]}|
+-----+-------------------+

Upvotes: 1

Related Questions