Rajesh Katkar
Rajesh Katkar

Reputation: 1

spark - conditinal statements inside select

I am selecting two Columns from Dataframe col1 and col2.

df.select((col("a")+col("b")).as("sum_col")

now user wants this sum_col to be fixed spaces to 4. so length of a and b is 2 hence max value can come less than 100 (two) or more than 100(three) so need to do it conditionally to add 1 or 2 spaces with it.

can anyone tell me how to handle within select block with cinditional logic to cast the Column to concat and decide one or two spaces to be added

Upvotes: 0

Views: 120

Answers (3)

user10457060
user10457060

Reputation:

Just use format_string function

import org.apache.spark.sql.functions.format_string

val df = Seq(1, 10, 100).toDF("sum_col")
val result = df.withColumn("sum_col_fmt", format_string("%4d", $"sum_col"))

And proof it works:

result.withColumn("proof", concat(lit("'"), $"sum_col_fmt", lit("'"))).show
// +-------+-----------+------+
// |sum_col|sum_col_fmt| proof|
// +-------+-----------+------+
// |      1|          1|'   1'|
//|     10|         10|'  10'|
// |    100|        100|' 100'|
// +-------+-----------+------+

Upvotes: 3

stack0114106
stack0114106

Reputation: 8711

Define a UDF and then register it. I added a dot in front of the format so that it can be shown in the output. Check this out

scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._

scala> val df = spark.range(1,20).toDF("col1")
df: org.apache.spark.sql.DataFrame = [col1: bigint]

scala>  val df2 = df.withColumn("newcol", 'col1 + 'col1)
df2: org.apache.spark.sql.DataFrame = [col1: bigint, newcol: bigint]

scala> def myPadding(a:String):String =
     | return ".%4s".format(a)
myPadding: (a: String)String

scala> val myUDFPad = udf( myPadding(_:String):String)
myUDFPad: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))

scala> df2.select(myUDFPad(df2("newcol"))).show
+-----------+
|UDF(newcol)|
+-----------+
|      .   2|
|      .   4|
|      .   6|
|      .   8|
|      .  10|
|      .  12|
|      .  14|
|      .  16|
|      .  18|
|      .  20|
|      .  22|
|      .  24|
|      .  26|
|      .  28|
|      .  30|
|      .  32|
|      .  34|
|      .  36|
|      .  38|
+-----------+


scala>

Upvotes: 0

pasha701
pasha701

Reputation: 7207

UDF with String.format:

val df = List((1, 2)).toDF("a", "b")

val leadingZeroes = (value: Integer) => String.format("%04d", value)
val leadingZeroesUDF = udf(leadingZeroes)

val result = df.withColumn("sum_col", leadingZeroesUDF($"a" + $"b"))

result.show(false)

Output:

+---+---+-------+
|a  |b  |sum_col|
+---+---+-------+
|1  |2  |0003   |
+---+---+-------+

Upvotes: 0

Related Questions