Kalpish Singhal
Kalpish Singhal

Reputation: 392

Achive dynamic string interpolation in Scala spark?

I have a string with the functions that need to go into .agg functions of my intended dataframe. MY data dataframe looks like

val client = Seq((1,"A","D",10),(2,"A","D",5),(3,"B","C",56),(5,"B","D",67)).toDF("ID","Categ","subCat","Amnt")
+---+-----+------+----+
| ID|Categ|subCat|Amnt|
+---+-----+------+----+
|  1|    A|     D|  10|
|  2|    A|     D|   5|
|  3|    B|     C|  56|
|  5|    B|     D|  67|
+---+-----+------+----+

so I'm tryin to interpolate this sting

val str= "s"$count(ID) as Total,$sum(Amnt) as amt""

I want to achieve this as output

client.groupBy("Categ","subCat").agg(sum("Amnt") as "amt",count("ID") as "Total").show()
+-----+------+---+-----+
|Categ|subCat|amt|Total|
+-----+------+---+-----+
|    B|     C| 56|    1|
|    A|     D| 15|    2|
|    B|     D| 67|    1|
+-----+------+---+-----+

I tried this

 client.groupBy("Categ","subCat").agg(s"$str").show()

getting an error

> error: overloaded method value agg with alternatives:  

(expr: org.apache.spark.sql.Column,exprs: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
(exprs: java.util.Map[String,String])org.apache.spark.sql.DataFrame (exprs: scala.collection.immutable.Map[String,String])org.apache.spark.sql.DataFrame (aggExpr: (String, String),aggExprs: (String, String)*)org.apache.spark.sql.DataFrame cannot be applied to (String)

I also Tried expr

    val str="sum(Amnt) as amt"
    client.groupBy("Categ","subCat").agg(expr(str)).show()
 this return the desired outcome
    +-----+------+---+
    |Categ|subCat|amt|
    +-----+------+---+
    |    B|     C| 56|
    |    A|     D| 15|
    |    B|     D| 67|
    +-----+------+---+

but again when i try val str="sum(Amnt) as amt,count(ID) as ID_tot"

    client.groupBy("Categ","subCat").agg(expr(str)).show()
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input ',' expecting <EOF>(line 1, pos 16)

Upvotes: 1

Views: 1680

Answers (3)

ajosh97
ajosh97

Reputation: 32

You can achieve the same by using List of String as expression -

val str=List("sum(Amnt) as amt,count(ID) as ID_tot") ds.selectExpr(str:_*)

Upvotes: 1

Alexey Romanov
Alexey Romanov

Reputation: 170899

A bit crude solution: split by , and call expr on each:

val str="sum(Amnt) as amt,count(ID) as ID_tot"
val (first, rest) = str.split(",").map(expr).splitAt(1)
client.groupBy("Categ","subCat").agg(first, rest: _*)

If , can be a part of an expression (e.g. inside a string literal), it gets worse: try to parse it with expr, catch ParseException and see where it ends? There should really be a more straightforward way, but I don't know it.

Upvotes: 3

bottaio
bottaio

Reputation: 5093

You can use different API to achieve what you want

import org.apache.spark.sql.functions._
client
  .groupBy("Categ", "subCat")
  .agg(
    sum("Amnt").as("Amnt"), 
    count(lit("1")).as("Total"))
  .show()

Alternatively, go full SparkSQL but I would discourage this pattern if you are trying to inject user's input.

spark.sql(s"select Categ, subCat, $str from client group by Categ, subCat")

Upvotes: 3

Related Questions