Reputation: 507

Spark SQL: How to apply specific functions to all specified columns

I there an easy way to call sql on multiple column on spark sql.

For example, Let's say I have a query that should be applied to most columns

select
min(c1) as min,
max(c1) as max,
max(c1) - min(c1) range
from table tb1

If there are multiple columns, is there a way to execute the query for all the columns, and get result one time.

Similar to how df.describe does.

Upvotes: 1

Answers (1)

Garren S

Reputation: 5792

Use the meta data (columns in this case) included in your dataframe (which you can get via spark.table("<table_name>") if you don't have it in scope already to get the column names, then apply the functions you want and pass to df.select (or df.selectExpr).

Build some test data:

scala> var seq = Seq[(Int, Int, Float)]()
seq: Seq[(Int, Int, Float)] = List()

scala> (1 to 1000).foreach(n => { seq = seq :+ (n,r.nextInt,r.nextFloat) })

scala> val df = seq.toDF("id", "some_int", "some_float")

Denote some functions we want to run on all the columns:

scala> val functions_to_apply = Seq("min", "max")
functions_to_apply: Seq[String] = List(min, max)

Setup the final Seq of SQL Columns:

scala> var select_columns = Seq[org.apache.spark.sql.Column]()
select_columns: Seq[org.apache.spark.sql.Column] = List()

Iterate over the columns and functions to apply to populate the select_columns Seq:

scala> val cols = df.columns

scala> cols.foreach(col => { functions_to_apply.foreach(f => {select_columns = select_columns :+ expr(s"$f($col)")})})

Run the actual query:

scala> df.select(select_columns:_*).show

+-------+-------+-------------+-------------+---------------+---------------+
|min(id)|max(id)|min(some_int)|max(some_int)|min(some_float)|max(some_float)|
+-------+-------+-------------+-------------+---------------+---------------+
|      1|   1000|  -2143898568|   2147289642|   1.8781424E-4|     0.99964607|
+-------+-------+-------------+-------------+---------------+---------------+

Upvotes: 1

Spark SQL: How to apply specific functions to all specified columns

Answers (1)

Related Questions