elm
elm

Reputation: 20435

Sum up rows in DataFrame

Given a DataFrame, for instance

val df = sc.parallelize(Seq((1L, 0.1), (2L, 0.2), (3L, 0.3))).toDF("k","v")

df.show
+---+---+
|  k|  v|
+---+---+
|  1|0.1|
|  2|0.2|
|  3|0.3|
+---+---+

how to sum up each row into a new column, named totals so that dfTotals.show

+---+---+--------+
|  k|  v|  totals|
+---+---+--------+
|  1|0.1|     1.1|
|  2|0.2|     2.2|
|  3|0.3|     3.3|
+---+---+--------+

Upvotes: 1

Views: 552

Answers (1)

elm
elm

Reputation: 20435

Found a solution simpler than originally thought,

val totals = ($"k" + $"v")
val dfTotals = df.withColumn("totals", totals)

and so

dfTotals.show
+---+---+------+
|  k|  v|totals|
+---+---+------+
|  1|0.1|   1.1|
|  2|0.2|   2.2|
|  3|0.3|   3.3|
+---+---+------+

Update: another approach, not so neat though,

df.select(df("k"), df("v"), df("k")+df("v"))

Upvotes: 1

Related Questions