Aditya Seth
Aditya Seth

Reputation: 53

Update Column in Spark Scala

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1| G K|  0  |
|  2| L_L|  1  |
|  3|null|  1  |
+---+----+-----+
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1| GK|  0  |
|  2| LL|  1  |
|  3|null|  1  |
+---+----+-----+

I just want to update the dept with new value which have underscore and space removed. Can This is possibe.

Upvotes: 0

Views: 101

Answers (2)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29155

scala> val inputDf = Seq((1,"G K","0 "), (2,"L_L","1"), (3,null,"  1")).toDF("sno","dept","color")
inputDf: org.apache.spark.sql.DataFrame = [sno: int, dept: string ... 1 more field]

scala> inputDf.show
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1| G K|   0 |
|  2| L_L|    1|
|  3|null|    1|
+---+----+-----+

Q: I just want to update the dept with new value which have underscore and space removed. Can This is possibe?.

yes it is...

inputDf.withColumn("dept",regexp_replace('dept , "_" ,"")) // replace underscore with empty string
.withColumn("dept",regexp_replace('dept , " " ,"")) // replace space with empty string
.withColumn("color", trim('color)).show // if you want to trim which has extra space
.show

Result :

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  GK|    0|
|  2|  LL|    1|
|  3|null|    1|
+---+----+-----+

OR

Smarter approach

1) \s|_ for only space and underscore.

2) Using underscore or removing any non alphanumeric use regex \W|_

val inputDf = Seq((1, "G K", "0 "), (2, "L_L", "1"), (3, null, "1")).toDF("sno", "dept", "color")
  inputDf.show

   inputDf.withColumn("dept", regexp_replace('dept, """\s|_""", "")).show

Results :

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  GK|   0 |
|  2| L_L|    1|
|  3|null|    1|
+---+----+-----+

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  GK|   0 |
|  2|  LL|    1|
|  3|null|    1|
+---+----+-----+

I hope this is exactly what you are looking for.

Upvotes: 1

QuickSilver
QuickSilver

Reputation: 4045

You can use regexp_replace and trim udf for this as below

import org.apache.spark.sql.functions._

object SampleDF {

  def main(args: Array[String]): Unit = {

    val spark = Constant.getSparkSess
    import spark.implicits._
    val inputDf = Seq((1,"G K","0 "),
      (2,"L-L","1"),
        (3,null,"  1")).toDF("sno","dept","color")

    inputDf
      .withColumn("dept",regexp_replace($"dept"," |-",""))
      .withColumn("color",trim($"color"))
      .show()
  }

}


Upvotes: 1

Related Questions