Apache Spark scala lowercase first letter using built-in function

Question

I'm trying to lowerCase the first letter of column values.

I can't find a way to lower only the first letter using built-in functions, I know there's initCap for capitalizing the data but I'm trying to decapitalize. I tried using substring but looks a bit overkill and didn't work.

val data = spark.sparkContext.parallelize(Seq(("Spark"),("SparkHello"),("Spark Hello"))).toDF("name")
data.withColumn("name",lower(substring($"name",1,1)) + substring($"name",2,?))

I know I can create a custom UDF but I thought there's may be a built-in solution for this.

mck · Accepted Answer

You can use the Spark SQL substring method, which allows neglecting the length argument (and will get the string until the end):

data.withColumn("name", concat(lower(substring($"name",1,1)), expr("substring(name,2)"))).show
+-----------+
|       name|
+-----------+
|      spark|
| sparkHello|
|spark Hello|
+-----------+

Note that you cannot + strings. You need to use concat.

Apache Spark scala lowercase first letter using built-in function

Answers (1)

Related Questions