Dan
Dan

Reputation: 524

How to update dataframe column in Spark Scala after join?

Join of two dataframes results into almost 60 columns. Most of them suppose to stay as is, but some require update based on values in other columns. Is there a way to update those columns w/o calculating new, removing the originals and renaming the calculated back?

Simplified example: the revenue in $"Sales column from the left dataframe is supposed to be weighted by the $"Weight in the join results. Is there an efficient way to make the calculation w/o generating the $"SalesWeighted as a new column, dropping the original $Sales and re-naming $SalesWeighted into $Sales?

val l = Seq((1, 50), (2, 35), (3, 66))
            .toDF("Id", "Sales")

val r = Seq((1, "Premium", 0.2), (1, "Standard", 0.8), 
            (2, "Premium", 0.4), (2, "Standard", 0.6), 
            (3, "Premium", 0.333), (3, "Standard", 0.333), (3, "Garbage", 0.334))
            .toDF("Id", "Grade", "Weight")

display(l.join(r, Seq("Id")).withColumn("SalesWeighted", $"Sales"*$"Weight")
            .orderBy($"Id", $"Grade"))

Upvotes: 0

Views: 1800

Answers (2)

Leo C
Leo C

Reputation: 22449

You can simply name the new column the same as the column to be replaced:

l.join(r, Seq("Id")).withColumn("Sales", $"Sales" * $"Weight").
  orderBy($"Id", $"Grade")

Or, just use select:

l.join(r, Seq("Id")).
  select($"Id", $"Grade", $"Weight", ($"Sales" * $"Weight").as("Sales")).
  orderBy($"Id", $"Grade")

Upvotes: 1

loneStar
loneStar

Reputation: 4010

Use Drop to remove the unnecessary columns

val l = Seq((1, 50), (2, 35), (3, 66))
                .toDF("Id", "Sales")

    val r = Seq((1, "Premium", 0.2), (1, "Standard", 0.8), 
                (2, "Premium", 0.4), (2, "Standard", 0.6), 
                (3, "Premium", 0.333), (3, "Standard", 0.333), (3, "Garbage", 0.334))
                .toDF("Id", "Grade", "Weight")

    display(l.join(r, Seq("Id")).withColumn("SalesWeighted", $"Sales"*$"Weight").drop($"Sales")
                .orderBy($"Id", $"Grade"))

Upvotes: 1

Related Questions