Reputation: 103
env: spark2.4.5
My spark sql:
SELECT A.*
FROM table_0
LATERAL VIEW explode(table_0.Array_0) exploded_a_values AS A
UNION
SELECT B.*
FROM table_0
LATERAL VIEW explode(table_0.Array_1) exploded_a_values AS B
The explode structure A and B have the same schema. Error occurs when one of them is empty:
Can only star expand struct data types. Attribute: `ArrayBuffer)`;
Notice that elements in array is type of struct. My purpose is to pick out distinct elements in different array.
So how can I handles such empty case? I would be very grateful if you could give me some suggestion.
Upvotes: 0
Views: 310
Reputation: 6338
when you explode an array using explode(table_0.Array_0) exploded_a_values AS A
,
Here
exploded_a_values
becomes table
andA
becomes the column representing exploded column
Thus, you can't call A.*
on that but of course you can call exploded_a_values.*
So your modified query will look like as below-
val table_0 = spark.range(1, 5)
.withColumn("Array_0", array(lit(1), lit(2)))
.withColumn("Array_1", array(lit(null).cast(IntegerType)))
table_0.show(false)
table_0.printSchema()
Ouput-
+---+-------+-------+
|id |Array_0|Array_1|
+---+-------+-------+
|1 |[1, 2] |[] |
|2 |[1, 2] |[] |
|3 |[1, 2] |[] |
|4 |[1, 2] |[] |
+---+-------+-------+
root
|-- id: long (nullable = false)
|-- Array_0: array (nullable = false)
| |-- element: integer (containsNull = false)
|-- Array_1: array (nullable = false)
| |-- element: integer (containsNull = true)
table_0.createOrReplaceTempView("table_0")
val processed = spark.sql(
"""
|SELECT exploded_a_values.*, table_0.id
|FROM table_0
| LATERAL VIEW explode(table_0.Array_0) exploded_a_values AS A
|UNION
|SELECT exploded_b_values.*, table_0.id
|FROM table_0
| LATERAL VIEW explode(table_0.Array_1) exploded_b_values AS B
""".stripMargin)
processed.show(false)
processed.printSchema()
Output-
+----+---+
|A |id |
+----+---+
|2 |2 |
|2 |4 |
|null|2 |
|null|4 |
|1 |1 |
|2 |1 |
|1 |2 |
|1 |3 |
|2 |3 |
|1 |4 |
|null|1 |
|null|3 |
+----+---+
root
|-- A: integer (nullable = true)
|-- id: long (nullable = false)
Note: Union can only be performed on tables with the compatible column types.
Tried with Array<struct>
, the same query worked well for me-
Here is the result:
+------+---+
|A |id |
+------+---+
|[a, 2]|1 |
|[a, 2]|2 |
|[a, 2]|4 |
|null |2 |
|null |4 |
|[a, 2]|3 |
|null |1 |
|null |3 |
+------+---+
root
|-- A: struct (nullable = true)
| |-- f1: string (nullable = false)
| |-- f2: integer (nullable = false)
|-- id: long (nullable = false)
For full example, please refer - this gist
Upvotes: 1