Reputation: 392
I have a string with the functions that need to go into .agg
functions of my intended dataframe.
MY data dataframe looks like
val client = Seq((1,"A","D",10),(2,"A","D",5),(3,"B","C",56),(5,"B","D",67)).toDF("ID","Categ","subCat","Amnt")
+---+-----+------+----+
| ID|Categ|subCat|Amnt|
+---+-----+------+----+
| 1| A| D| 10|
| 2| A| D| 5|
| 3| B| C| 56|
| 5| B| D| 67|
+---+-----+------+----+
so I'm tryin to interpolate this sting
val str= "s"$count(ID) as Total,$sum(Amnt) as amt""
I want to achieve this as output
client.groupBy("Categ","subCat").agg(sum("Amnt") as "amt",count("ID") as "Total").show()
+-----+------+---+-----+
|Categ|subCat|amt|Total|
+-----+------+---+-----+
| B| C| 56| 1|
| A| D| 15| 2|
| B| D| 67| 1|
+-----+------+---+-----+
I tried this
client.groupBy("Categ","subCat").agg(s"$str").show()
getting an error
> error: overloaded method value agg with alternatives:
(expr: org.apache.spark.sql.Column,exprs: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
(exprs: java.util.Map[String,String])org.apache.spark.sql.DataFrame (exprs: scala.collection.immutable.Map[String,String])org.apache.spark.sql.DataFrame (aggExpr: (String, String),aggExprs: (String, String)*)org.apache.spark.sql.DataFrame cannot be applied to (String)
I also Tried expr
val str="sum(Amnt) as amt"
client.groupBy("Categ","subCat").agg(expr(str)).show()
this return the desired outcome
+-----+------+---+
|Categ|subCat|amt|
+-----+------+---+
| B| C| 56|
| A| D| 15|
| B| D| 67|
+-----+------+---+
but again when i try
val str="sum(Amnt) as amt,count(ID) as ID_tot"
client.groupBy("Categ","subCat").agg(expr(str)).show()
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input ',' expecting <EOF>(line 1, pos 16)
Upvotes: 1
Views: 1680
Reputation: 32
You can achieve the same by using List of String as expression -
val str=List("sum(Amnt) as amt,count(ID) as ID_tot")
ds.selectExpr(str:_*)
Upvotes: 1
Reputation: 170899
A bit crude solution: split by ,
and call expr
on each:
val str="sum(Amnt) as amt,count(ID) as ID_tot"
val (first, rest) = str.split(",").map(expr).splitAt(1)
client.groupBy("Categ","subCat").agg(first, rest: _*)
If ,
can be a part of an expression (e.g. inside a string literal), it gets worse: try to parse it with expr
, catch ParseException
and see where it ends? There should really be a more straightforward way, but I don't know it.
Upvotes: 3
Reputation: 5093
You can use different API to achieve what you want
import org.apache.spark.sql.functions._
client
.groupBy("Categ", "subCat")
.agg(
sum("Amnt").as("Amnt"),
count(lit("1")).as("Total"))
.show()
Alternatively, go full SparkSQL but I would discourage this pattern if you are trying to inject user's input.
spark.sql(s"select Categ, subCat, $str from client group by Categ, subCat")
Upvotes: 3