How to give alias name for posexplode columns in Spark SQL?

Question

The below statement generates "pos" and "col" as default column names when I use posexplode() function in Spark SQL.

scala> spark.sql(""" with t1(select to_date('2019-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) from t1 """).show(false)
+----------+----------+---+---+
|first_day |last_day  |pos|col|
+----------+----------+---+---+
|2019-01-01|2019-01-31|0  |5  |
|2019-01-01|2019-01-31|1  |6  |
|2019-01-01|2019-01-31|2  |7  |
+----------+----------+---+---+

What is the syntax to override those default names in Spark SQL? In dataframes, this can be done by giving df.explode(select 'arr.as(Seq("arr_val","arr_pos")))

scala> val arr= Array(5,6,7)
arr: Array[Int] = Array(5, 6, 7)

scala> Seq(("dummy")).toDF("x").select(posexplode(lit(arr)).as(Seq("arr_val","arr_pos"))).show(false)
+-------+-------+
|arr_val|arr_pos|
+-------+-------+
|0      |5      |
|1      |6      |
|2      |7      |
+-------+-------+

How to get that in SQL?
I have unsuccessfully tried:

spark.sql(""" with t1(select to_date('2011-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) as(Seq('p','c')) from t1 """).show(false)

and

spark.sql(""" with t1(select to_date('2011-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) as(('p','c')) from t1 """).show(false)

user10938362 · Accepted Answer

You can either use LATERAL VIEW:

spark.sql("""
  WITH t1 AS (SELECT to_date('2011-01-01') first_day)
  SELECT first_day, date_sub(add_months(first_day,1),1) last_day, p, c
  FROM t1
  LATERAL VIEW  posexplode(array(5,6,7)) AS p, c
""").show

+----------+----------+---+---+
| first_day|  last_day|  p|  c|
+----------+----------+---+---+
|2011-01-01|2011-01-31|  0|  5|
|2011-01-01|2011-01-31|  1|  6|
|2011-01-01|2011-01-31|  2|  7|
+----------+----------+---+---+

or a tuple of aliases

spark.sql("""
  WITH t1 AS (SELECT to_date('2011-01-01') first_day)
  SELECT first_day, date_sub(add_months(first_day,1),1) last_day,
         posexplode(array(5,6,7)) AS (p, c) 
  FROM t1 
""").show

+----------+----------+---+---+
| first_day|  last_day|  p|  c|
+----------+----------+---+---+
|2011-01-01|2011-01-31|  0|  5|
|2011-01-01|2011-01-31|  1|  6|
|2011-01-01|2011-01-31|  2|  7|
+----------+----------+---+---+

Tested with Spark 2.4.0.

Please note that aliases are not strings, and shouldn't be quoted with ' or ". If you have to use non-standard identifiers you should use backticks, i.e.

WITH t1 AS (SELECT to_date('2011-01-01') first_day)
SELECT first_day, date_sub(add_months(first_day,1),1) last_day,
       posexplode(array(5,6,7)) AS (`arr pos`, `arr_value`) 
FROM t1

How to give alias name for posexplode columns in Spark SQL?

Answers (2)

Related Questions