Reputation: 85
I have a dataset with 5 Million records, I need to replace all the values in column using startsWith()
supplying multiple or and conditions.
This code works for a single condition:
df2.withColumn("Deposits", when(col("Deposits").startsWith("0"),"1.1").otherwise(col("Deposits"))).show()
I need to do this:
df2.withColumn("Deposits", when(col("Deposits").startsWith("0"||"1"),"1.1").otherwise(col("Deposits")))
Upvotes: 5
Views: 2894
Reputation: 32680
Use rlike
with regex ^(1|0)
(starts with 1 or 0):
df2.withColumn(
"Deposits",
when(col("Deposits").rlike("^(1|0)"), "1.1").otherwise(col("Deposits"))
).show
If you want to update Deposits
column depending on the startwith condition, you can chain multiple when expressions like this:
val depositsCol = Seq(
("1", "1.1"),
("2", "1.2")
).foldLeft(col("Deposits")) { case (acc, (start, value)) =>
when(col("Deposits").startsWith(start), value).otherwise(acc)
}
df2.withColumn("Deposits", depositsCol).show
Upvotes: 4
Reputation: 42382
You can separate the two conditions and combine them with ||
(or):
df2.withColumn(
"Deposits",
when(
col("Deposits").startsWith("0") || col("Deposits").startsWith("1"),
"1.1"
).otherwise(
col("Deposits")
)
)
Upvotes: 3