Reputation: 89
Input Dataframe
val ds = Seq((1,"play framework"),
(2,"spark framework"),
(3,"spring framework ")).toDF("id","subject")
I am expecting title case on column subject like as follows .
val ds = Seq((1,"Play Framework"),
(2,"Spark Framework"),
(3,"Spring Framework ")).toDF("id","subject")
I could use Use lower function from org.apache.spark.sql.functions
like ds.select($"subject", lower($"subject")).show
to convert into lower case . But how i can make a result as i expected as above ?
Upvotes: 1
Views: 10719
Reputation: 1892
You can do like this
val captalizeUDF=udf((str:String)=>str.split(" ").map(word=>word.trim.capitalize).mkString(" "))
ds.select($"id",captalizeUDF($"subject").alias("subject")).show
or
ds.select($"id",initcap($"subject").alias("subject")).show
Sample output:
+---+----------------+
| id| subject|
+---+----------------+
| 1| Play Framework|
| 2| Spark Framework|
| 3|Spring Framework|
+---+----------------+
Upvotes: 1
Reputation: 41957
there is a inbuilt function called initcap
which does exactly as you require
import org.apache.spark.sql.functions._
ds.withColumn("subject", initcap(col("subject"))).show(false)
the official documentation says it
public static Column initcap(Column e) Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.
Upvotes: 7