Reputation: 8711
The below statement generates "pos" and "col" as default column names when I use posexplode()
function in Spark SQL.
scala> spark.sql(""" with t1(select to_date('2019-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) from t1 """).show(false)
+----------+----------+---+---+
|first_day |last_day |pos|col|
+----------+----------+---+---+
|2019-01-01|2019-01-31|0 |5 |
|2019-01-01|2019-01-31|1 |6 |
|2019-01-01|2019-01-31|2 |7 |
+----------+----------+---+---+
What is the syntax to override those default names in Spark SQL?
In dataframes, this can be done by giving df.explode(select 'arr.as(Seq("arr_val","arr_pos")))
scala> val arr= Array(5,6,7)
arr: Array[Int] = Array(5, 6, 7)
scala> Seq(("dummy")).toDF("x").select(posexplode(lit(arr)).as(Seq("arr_val","arr_pos"))).show(false)
+-------+-------+
|arr_val|arr_pos|
+-------+-------+
|0 |5 |
|1 |6 |
|2 |7 |
+-------+-------+
How to get that in SQL?
I have unsuccessfully tried:
spark.sql(""" with t1(select to_date('2011-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) as(Seq('p','c')) from t1 """).show(false)
and
spark.sql(""" with t1(select to_date('2011-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) as(('p','c')) from t1 """).show(false)
Upvotes: 8
Views: 11042
Reputation: 24356
The following options work well in Spark SQL:
posexplode(col_name) as (aa, bb)
posexplode(col_name) (aa, bb)
Upvotes: 0
Reputation: 4151
You can either use LATERAL VIEW
:
spark.sql("""
WITH t1 AS (SELECT to_date('2011-01-01') first_day)
SELECT first_day, date_sub(add_months(first_day,1),1) last_day, p, c
FROM t1
LATERAL VIEW posexplode(array(5,6,7)) AS p, c
""").show
+----------+----------+---+---+
| first_day| last_day| p| c|
+----------+----------+---+---+
|2011-01-01|2011-01-31| 0| 5|
|2011-01-01|2011-01-31| 1| 6|
|2011-01-01|2011-01-31| 2| 7|
+----------+----------+---+---+
or a tuple of aliases
spark.sql("""
WITH t1 AS (SELECT to_date('2011-01-01') first_day)
SELECT first_day, date_sub(add_months(first_day,1),1) last_day,
posexplode(array(5,6,7)) AS (p, c)
FROM t1
""").show
+----------+----------+---+---+
| first_day| last_day| p| c|
+----------+----------+---+---+
|2011-01-01|2011-01-31| 0| 5|
|2011-01-01|2011-01-31| 1| 6|
|2011-01-01|2011-01-31| 2| 7|
+----------+----------+---+---+
Tested with Spark 2.4.0.
Please note that aliases are not strings, and shouldn't be quoted with '
or "
. If you have to use non-standard identifiers you should use backticks, i.e.
WITH t1 AS (SELECT to_date('2011-01-01') first_day)
SELECT first_day, date_sub(add_months(first_day,1),1) last_day,
posexplode(array(5,6,7)) AS (`arr pos`, `arr_value`)
FROM t1
Upvotes: 9