Sumi
Sumi

Reputation: 23

Blank spaces in string | Spark Scala

I have a non-breaking trailing space in a string in a column . I have tried the below solutions but cannot get rid of the space.

df.select(
  col("city"),
  regexp_replace(col("city"), " ", ""),
  regexp_replace(col("city"), "[\\r\\n]", ""),
  regexp_replace(col("city"), "\\s+$", ""),
  rtrim(col("city"))
).show()

enter image description here

Is there any other possible solution I can try to remove the blank space?

Upvotes: 0

Views: 358

Answers (1)

Koedlt
Koedlt

Reputation: 5973

You can use the ltrim, rtrim or trim functions from org.apache.sql.functions:

import spark.implicits._
import org.apache.spark.sql.functions._

val df = Seq(
  ("Bengaluru   "),
  ("   Bengaluru"),
  ("   Bengaluru    ")
).toDF("city")

df.show
+----------------+                                                                                                                                                                                                                                                              
|            city|                                                                                                                                                                                                                                                              
+----------------+                                                                                                                                                                                                                                                              
|    Bengaluru   |                                                                                                                                                                                                                                                              
|       Bengaluru|                                                                                                                                                                                                                                                              
|   Bengaluru    |                                                                                                                                                                                                                                                              
+----------------+

df.withColumn("city", ltrim(col("city"))).show
+-------------+                                                                                                                                                                                                                                                                 
|         city|                                                                                                                                                                                                                                                                 
+-------------+                                                                                                                                                                                                                                                                 
| Bengaluru   |                                                                                                                                                                                                                                                                 
|    Bengaluru|                                                                                                                                                                                                                                                                 
|Bengaluru    |                                                                                                                                                                                                                                                                 
+-------------


df.withColumn("city", rtrim(col("city"))).show
+------------+                                                                                                                                                                                                                                                                  
|        city|                                                                                                                                                                                                                                                                  
+------------+                                                                                                                                                                                                                                                                  
|   Bengaluru|                                                                                                                                                                                                                                                                  
|   Bengaluru|                                                                                                                                                                                                                                                                  
|   Bengaluru|                                                                                                                                                                                                                                                                  
+------------+

df.withColumn("city", trim(col("city"))).show
+---------+                                                                                                                                                                                                                                                                     
|     city|                                                                                                                                                                                                                                                                     
+---------+                                                                                                                                                                                                                                                                     
|Bengaluru|                                                                                                                                                                                                                                                                     
|Bengaluru|                                                                                                                                                                                                                                                                     
|Bengaluru|                                                                                                                                                                                                                                                                     
+---------+

Choosing whether you want to remove leading/trailing spaces or both.

Hope this helps!

Upvotes: 1

Related Questions