Reputation: 53
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| G K| 0 |
| 2| L_L| 1 |
| 3|null| 1 |
+---+----+-----+
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| GK| 0 |
| 2| LL| 1 |
| 3|null| 1 |
+---+----+-----+
I just want to update the dept with new value which have underscore and space removed. Can This is possibe.
Upvotes: 0
Views: 101
Reputation: 29155
scala> val inputDf = Seq((1,"G K","0 "), (2,"L_L","1"), (3,null," 1")).toDF("sno","dept","color")
inputDf: org.apache.spark.sql.DataFrame = [sno: int, dept: string ... 1 more field]
scala> inputDf.show
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| G K| 0 |
| 2| L_L| 1|
| 3|null| 1|
+---+----+-----+
Q: I just want to update the dept with new value which have underscore and space removed. Can This is possibe?.
yes it is...
inputDf.withColumn("dept",regexp_replace('dept , "_" ,"")) // replace underscore with empty string
.withColumn("dept",regexp_replace('dept , " " ,"")) // replace space with empty string
.withColumn("color", trim('color)).show // if you want to trim which has extra space
.show
Result :
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| GK| 0|
| 2| LL| 1|
| 3|null| 1|
+---+----+-----+
OR
Smarter approach
1) \s|_
for only space and underscore.
2) Using underscore or removing any non alphanumeric use regex \W|_
val inputDf = Seq((1, "G K", "0 "), (2, "L_L", "1"), (3, null, "1")).toDF("sno", "dept", "color")
inputDf.show
inputDf.withColumn("dept", regexp_replace('dept, """\s|_""", "")).show
Results :
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| GK| 0 |
| 2| L_L| 1|
| 3|null| 1|
+---+----+-----+
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| GK| 0 |
| 2| LL| 1|
| 3|null| 1|
+---+----+-----+
I hope this is exactly what you are looking for.
Upvotes: 1
Reputation: 4045
You can use regexp_replace
and trim
udf for this as below
import org.apache.spark.sql.functions._
object SampleDF {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess
import spark.implicits._
val inputDf = Seq((1,"G K","0 "),
(2,"L-L","1"),
(3,null," 1")).toDF("sno","dept","color")
inputDf
.withColumn("dept",regexp_replace($"dept"," |-",""))
.withColumn("color",trim($"color"))
.show()
}
}
Upvotes: 1