Shahar Sadkovich
Shahar Sadkovich

Reputation: 110

Apache Spark scala lowercase first letter using built-in function

I'm trying to lowerCase the first letter of column values.

I can't find a way to lower only the first letter using built-in functions, I know there's initCap for capitalizing the data but I'm trying to decapitalize. I tried using substring but looks a bit overkill and didn't work.

val data = spark.sparkContext.parallelize(Seq(("Spark"),("SparkHello"),("Spark Hello"))).toDF("name")
data.withColumn("name",lower(substring($"name",1,1)) + substring($"name",2,?))

I know I can create a custom UDF but I thought there's may be a built-in solution for this.

Upvotes: 0

Views: 706

Answers (1)

mck
mck

Reputation: 42352

You can use the Spark SQL substring method, which allows neglecting the length argument (and will get the string until the end):

data.withColumn("name", concat(lower(substring($"name",1,1)), expr("substring(name,2)"))).show
+-----------+
|       name|
+-----------+
|      spark|
| sparkHello|
|spark Hello|
+-----------+

Note that you cannot + strings. You need to use concat.

Upvotes: 1

Related Questions