Reputation: 109
I have data.txt
file as below.
12, 345, 6789
Now, I want to perform leading zero padding with the specified number of digits in the specified field of the argument file or standard input. The number of digits specified in the specified field of the argument file is 8 digits. What should I do?
This is my code:
import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
//Convert textfile to DF
val conf = new SparkConf().setAppName("ct").setMaster("local").set("spark.driver.allowMultipleContexts", "true")
val sc = SparkContext(conf)
val sparkSess = SparkSession.builder().appName("SparkSessionZipsExample").config(conf).getOrCreate()
val path = "data.txt"
val data = sc.textFile(path)
val colNum = data.first().split(",").size
var schemaString = "key"
for( i <- 1 to colNum - 1) {
schemaString += " value" + i
}
val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable=true))
val schema = StructType(fields)
val dfWithSchema = sparkSess.read.option("header", "false").schema(schema).csv(path)
dfWithSchema.show()
//add leading zero padding with the specified number of digits
//The number of digits specified in the specified field of the argument file is 8 digits
val df = dfWithSchema.withColumn("key", format_string("%08d", $"key")).show
val df2 = dfWithSchema.withColumn("value2", format_string("%08d", $"value2")).show
But the output result is incorrect.
I want to have the desired output result as below. Please help me.
+---------+------+---------+
|key |value1|value2 |
+---------+------+---------+
| 00000012| 345| 00006789|
+---------+------+---------+
Upvotes: 4
Views: 14542
Reputation: 7316
You can use the build-in lpad
function as shown below:
import org.apache.spark.sql.functions.lpad
dfWithSchema.select(
lpad($"key", 8, "0",
lpad($"value2", 8, "0"),
$"value1"
).show
This will insert 0s in the front of the string for a maximum of 8 characters.
Please refer here for details.
Upvotes: 9