minnu
minnu

Reputation: 57

Writting generic function for withcolumn in spark scala

Am creating a new dataframe df using the below withcolumn conditions.I have the same usage of the below withcolumn conditions for other dataframes too.How to write these all withcolumn conditions as a generic function and access it across all dataframes.

val df = sampledf.withColumn("concat", concat($"columna", $"columnb", $"columnc"))
         .withColumn("sub", $"columna" -  $"columnb")
         .withColumn("div", $"columna" / $"columnb")
         .withColumn("mul", $"columna" * $"columnb")

Upvotes: 1

Views: 167

Answers (2)

s.polam
s.polam

Reputation: 10382

Use higher order functions.

Check below code.

Defining common function.

scala> def func(
    f: (Column,Column) => Column, 
    cols:Column*
): Column = cols.reduce(f)

Sample DataFrame

scala> df.show(false)
+-------+-------+-------+
|columna|columnb|columnc|
+-------+-------+-------+
|1      |2      |3      |
+-------+-------+-------+

Creating Expressions.

scala> val colExpr = Seq(
     |     $"columna",
     |     $"columnb",
     |     $"columnc",
     |     func(concat(_,_),$"columna",$"columnb",$"columnc").as("concat"),
     |     func((_ / _),$"columna",$"columnb").as("div"),
     |     func((_ * _),$"columna",$"columnb").as("mul"),
     |     func((_ + _),$"columna",$"columnb").as("add"),
     |     func((_ - _),$"columna",$"columnb").as("sub")
     | )

Applying Expressions.

scala> df.select(colExpr:_*).show(false)
+-------+-------+-------+------+---+---+---+---+
|columna|columnb|columnc|concat|div|mul|add|sub|
+-------+-------+-------+------+---+---+---+---+
|1      |2      |3      |123   |0.5|2  |3  |-1 |
+-------+-------+-------+------+---+---+---+---+

Check this post for more details.

Upvotes: 0

Powers
Powers

Reputation: 19328

Here's a reusable function:

def yourFunction()(df: DataFrame) = {
  df.withColumn("concat", concat($"columna", $"columnb", $"columnc"))
    .withColumn("sub", $"columna" -  $"columnb")
    .withColumn("div", $"columna" / $"columnb")
    .withColumn("mul", $"columna" * $"columnb")
}

Here's how you can use the function:

val df = sampledf.transform(yourFunction())

See this post for more information about chaining DataFrame transformations with Spark. It's a really important design pattern to write clean Spark code.

Upvotes: 3

Related Questions