Soumyadip Ghosh
Soumyadip Ghosh

Reputation: 188

Adding new Column based on Old Column in Spark DataFrame

I have a dataframe as follows.

key     | value
inv_1_c | 5
inv_1_v | 8
inv_2_c | 9

I would like to add two columns to the dataframe Voltage and Current.

Voltage would be value if key ends with "_v" or 0 otherwise.

Current would be value if key ends with "_c" or 0 otherwise.

What would be the scala spark code for this ?

Upvotes: 2

Views: 1054

Answers (1)

koiralo
koiralo

Reputation: 23099

You can use subString function to get the last two characters and check if it contains _v or _c and add two new columns with withColumn

import org.apache.spark.sql.functions._

val data = Seq(
  ("inv_1_c", "5"),
  ("inv_1_v", "8"),
  ("inv_2_c", "9")
).toDF("key", "value")

data.withColumn("temp", substring($"key", -2, 2))
    .withColumn("voltage", when($"temp" === "_v", $"value").otherwise(0))
    .withColumn("current", when($"temp" === "_c", $"value").otherwise(0))
    .drop("temp")

Output:

+-------+-----+-------+-------+
|key    |value|voltage|current|
+-------+-----+-------+-------+
|inv_1_c|5    |0      |5      |
|inv_1_v|8    |8      |0      |
|inv_2_c|9    |0      |9      |
+-------+-----+-------+-------+

Hope this helps!

Upvotes: 3

Related Questions