Ayeza Malik
Ayeza Malik

Reputation: 129

Modify DataFrame values against a particular value in Spark Scala

See my code:

val spark: SparkSession = SparkSession.builder().appName("ReadFiles").master("local[*]").getOrCreate()

import spark.implicits._

val data: DataFrame = spark.read.option("header", "true")
  .option("inferschema", "true")
  .csv("Resources/atom.csv")

data.show()

Data looks like:

ID  Name  City  newCol1  newCol2
1   Ali   lhr   null     null
2   Ahad  swl    1        10
3   Sana  khi   null     null
4   ABC   xyz   null     null

New list of values:

val nums: List[Int] = List(10,20)

I want to add these values where ID=4. So that DataFrame may look like:

ID  Name  City  newCol1  newCol2
1   Ali   lhr   null     null
2   Ahad  swl    1        10
3   Sana  khi   null     null
4   ABC   xyz    10       20

I wonder if it is possible. Any help will be appreciated. Thanks

Upvotes: 1

Views: 49

Answers (1)

notNull
notNull

Reputation: 31460

It's possible, Use when otherwise statements for this case.

import org.apache.spark.sql.functions._
    
df.withColumn("newCol1", when(col("id") === 4, lit(nums(0))).otherwise(col("newCol1")))
  .withColumn("newCol2", when(col("id") === 4, lit(nums(1))).otherwise(col("newCol2")))
  .show()

//+---+----+----+-------+-------+
//| ID|Name|City|newCol1|newCol2|
//+---+----+----+-------+-------+
//|  1| Ali| lhr|   null|   null|
//|  2|Ahad| swl|      1|     10|
//|  3|Sana| khi|   null|   null|
//|  4| ABC| xyz|     10|     20|
//+---+----+----+-------+-------+

Upvotes: 2

Related Questions