JohnTrung
JohnTrung

Reputation: 109

How to add leading zero padding with the specified number of digits in scala spark?

I have data.txt file as below.

12, 345, 6789

Now, I want to perform leading zero padding with the specified number of digits in the specified field of the argument file or standard input. The number of digits specified in the specified field of the argument file is 8 digits. What should I do?

This is my code:

import org.apache.spark.sql.types._  
import org.apache.spark.sql.types._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql._

//Convert textfile to DF
val conf = new SparkConf().setAppName("ct").setMaster("local").set("spark.driver.allowMultipleContexts", "true")
val sc = SparkContext(conf)
val sparkSess = SparkSession.builder().appName("SparkSessionZipsExample").config(conf).getOrCreate()
val path = "data.txt"
val data = sc.textFile(path)
val colNum = data.first().split(",").size
var schemaString = "key"
for( i <- 1 to colNum - 1) {
 schemaString += " value" + i
}
val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable=true))
val schema = StructType(fields)
val dfWithSchema = sparkSess.read.option("header", "false").schema(schema).csv(path)
dfWithSchema.show()

//add leading zero padding with the specified number of digits
//The number of digits specified in the specified field of the argument file is 8 digits
val df = dfWithSchema.withColumn("key", format_string("%08d", $"key")).show
val df2 = dfWithSchema.withColumn("value2", format_string("%08d", $"value2")).show

But the output result is incorrect.

I want to have the desired output result as below. Please help me.

+---------+------+---------+
|key      |value1|value2   |
+---------+------+---------+
| 00000012|   345| 00006789|
+---------+------+---------+

Upvotes: 4

Views: 14542

Answers (1)

abiratsis
abiratsis

Reputation: 7316

You can use the build-in lpad function as shown below:

import org.apache.spark.sql.functions.lpad

dfWithSchema.select(
  lpad($"key", 8, "0", 
  lpad($"value2", 8, "0"),
  $"value1"
).show

This will insert 0s in the front of the string for a maximum of 8 characters.

Please refer here for details.

Upvotes: 9

Related Questions