SK Sayyad
SK Sayyad

Reputation: 85

How can I supply multiple conditions in spark startsWith() function?

I have a dataset with 5 Million records, I need to replace all the values in column using startsWith() supplying multiple or and conditions.

This code works for a single condition:

df2.withColumn("Deposits", when(col("Deposits").startsWith("0"),"1.1").otherwise(col("Deposits"))).show()

I need to do this:

df2.withColumn("Deposits", when(col("Deposits").startsWith("0"||"1"),"1.1").otherwise(col("Deposits")))

Upvotes: 5

Views: 2894

Answers (2)

blackbishop
blackbishop

Reputation: 32680

Use rlike with regex ^(1|0) (starts with 1 or 0):

df2.withColumn(
  "Deposits",
  when(col("Deposits").rlike("^(1|0)"), "1.1").otherwise(col("Deposits"))
).show

If you want to update Deposits column depending on the startwith condition, you can chain multiple when expressions like this:

val depositsCol = Seq(
  ("1", "1.1"),
  ("2", "1.2")
).foldLeft(col("Deposits")) { case (acc, (start, value)) =>
  when(col("Deposits").startsWith(start), value).otherwise(acc)
}

df2.withColumn("Deposits", depositsCol).show

Upvotes: 4

mck
mck

Reputation: 42382

You can separate the two conditions and combine them with || (or):

df2.withColumn(
    "Deposits",
    when(
        col("Deposits").startsWith("0") || col("Deposits").startsWith("1"),
        "1.1"
    ).otherwise(
        col("Deposits")
    )
)

Upvotes: 3

Related Questions