Reputation: 1
I am selecting two Column
s from Dataframe col1
and col2
.
df.select((col("a")+col("b")).as("sum_col")
now user wants this sum_col
to be fixed spaces to 4.
so length of a
and b
is 2 hence max value can come less than 100 (two) or more than 100(three) so need to do it conditionally to add 1 or 2 spaces with it.
can anyone tell me how to handle within select
block with cinditional logic to cast the Column
to concat
and decide one or two spaces to be added
Upvotes: 0
Views: 120
Reputation:
Just use format_string
function
import org.apache.spark.sql.functions.format_string
val df = Seq(1, 10, 100).toDF("sum_col")
val result = df.withColumn("sum_col_fmt", format_string("%4d", $"sum_col"))
And proof it works:
result.withColumn("proof", concat(lit("'"), $"sum_col_fmt", lit("'"))).show
// +-------+-----------+------+
// |sum_col|sum_col_fmt| proof|
// +-------+-----------+------+
// | 1| 1|' 1'|
//| 10| 10|' 10'|
// | 100| 100|' 100'|
// +-------+-----------+------+
Upvotes: 3
Reputation: 8711
Define a UDF and then register it. I added a dot in front of the format so that it can be shown in the output. Check this out
scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._
scala> val df = spark.range(1,20).toDF("col1")
df: org.apache.spark.sql.DataFrame = [col1: bigint]
scala> val df2 = df.withColumn("newcol", 'col1 + 'col1)
df2: org.apache.spark.sql.DataFrame = [col1: bigint, newcol: bigint]
scala> def myPadding(a:String):String =
| return ".%4s".format(a)
myPadding: (a: String)String
scala> val myUDFPad = udf( myPadding(_:String):String)
myUDFPad: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
scala> df2.select(myUDFPad(df2("newcol"))).show
+-----------+
|UDF(newcol)|
+-----------+
| . 2|
| . 4|
| . 6|
| . 8|
| . 10|
| . 12|
| . 14|
| . 16|
| . 18|
| . 20|
| . 22|
| . 24|
| . 26|
| . 28|
| . 30|
| . 32|
| . 34|
| . 36|
| . 38|
+-----------+
scala>
Upvotes: 0
Reputation: 7207
UDF with String.format:
val df = List((1, 2)).toDF("a", "b")
val leadingZeroes = (value: Integer) => String.format("%04d", value)
val leadingZeroesUDF = udf(leadingZeroes)
val result = df.withColumn("sum_col", leadingZeroesUDF($"a" + $"b"))
result.show(false)
Output:
+---+---+-------+
|a |b |sum_col|
+---+---+-------+
|1 |2 |0003 |
+---+---+-------+
Upvotes: 0