David
David

Reputation: 103

How to union two exploded results when one is empty?

env: spark2.4.5

My spark sql:

SELECT A.*
FROM table_0
    LATERAL VIEW explode(table_0.Array_0) exploded_a_values AS A
UNION
SELECT B.*
FROM table_0
    LATERAL VIEW explode(table_0.Array_1) exploded_a_values AS B

The explode structure A and B have the same schema. Error occurs when one of them is empty:

Can only star expand struct data types. Attribute: `ArrayBuffer)`;

Notice that elements in array is type of struct. My purpose is to pick out distinct elements in different array.

So how can I handles such empty case? I would be very grateful if you could give me some suggestion.

Upvotes: 0

Views: 310

Answers (1)

Som
Som

Reputation: 6338

when you explode an array using explode(table_0.Array_0) exploded_a_values AS A,

Here

  • exploded_a_values becomes table and
  • A becomes the column representing exploded column

Thus, you can't call A.* on that but of course you can call exploded_a_values.*

So your modified query will look like as below-

1. Read the input

     val table_0 =  spark.range(1, 5)
      .withColumn("Array_0", array(lit(1), lit(2)))
      .withColumn("Array_1", array(lit(null).cast(IntegerType)))
    table_0.show(false)
    table_0.printSchema()

Ouput-

+---+-------+-------+
|id |Array_0|Array_1|
+---+-------+-------+
|1  |[1, 2] |[]     |
|2  |[1, 2] |[]     |
|3  |[1, 2] |[]     |
|4  |[1, 2] |[]     |
+---+-------+-------+

root
 |-- id: long (nullable = false)
 |-- Array_0: array (nullable = false)
 |    |-- element: integer (containsNull = false)
 |-- Array_1: array (nullable = false)
 |    |-- element: integer (containsNull = true)

2. Run Union query

    table_0.createOrReplaceTempView("table_0")

    val processed = spark.sql(
      """
        |SELECT exploded_a_values.*, table_0.id
        |FROM table_0
        |    LATERAL VIEW explode(table_0.Array_0) exploded_a_values AS A
        |UNION
        |SELECT exploded_b_values.*, table_0.id
        |FROM table_0
        |    LATERAL VIEW explode(table_0.Array_1) exploded_b_values AS B
      """.stripMargin)
    processed.show(false)
    processed.printSchema()

Output-

+----+---+
|A   |id |
+----+---+
|2   |2  |
|2   |4  |
|null|2  |
|null|4  |
|1   |1  |
|2   |1  |
|1   |2  |
|1   |3  |
|2   |3  |
|1   |4  |
|null|1  |
|null|3  |
+----+---+

root
 |-- A: integer (nullable = true)
 |-- id: long (nullable = false)

Note: Union can only be performed on tables with the compatible column types.

Edit-1 (As per comments)

Tried with Array<struct>, the same query worked well for me- Here is the result:

+------+---+
|A     |id |
+------+---+
|[a, 2]|1  |
|[a, 2]|2  |
|[a, 2]|4  |
|null  |2  |
|null  |4  |
|[a, 2]|3  |
|null  |1  |
|null  |3  |
+------+---+

root
 |-- A: struct (nullable = true)
 |    |-- f1: string (nullable = false)
 |    |-- f2: integer (nullable = false)
 |-- id: long (nullable = false)

For full example, please refer - this gist

Upvotes: 1

Related Questions