summerbulb
summerbulb

Reputation: 5879

How to calculate the current row with the next one?

In Spark-Sql version 1.6, using DataFrames, is there a way to calculate, for a specific column, the sum of the current row and the next one, for every row?

For example, if I have a table with one column, like so

Age
12
23
31
67

I'd like the following output

Sum
35
54
98

The last row is dropped because it has no "next row" to be added to.

Right now I am doing it by ranking the table and joining it with itself, where the rank is equals to rank+1.

Is there a better way to do this? Can this be done with a Window function?

Upvotes: 2

Views: 1333

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41987

Yes definitely you can do with Window function by using rowsBetween function. I have used person column for grouping purpose in my following example.

import sqlContext.implicits._
import org.apache.spark.sql.functions._

val dataframe = Seq(
  ("A",12),
  ("A",23),
  ("A",31),
  ("A",67)
).toDF("person", "Age")

val windowSpec = Window.partitionBy("person").orderBy("Age").rowsBetween(0, 1)
val newDF = dataframe.withColumn("sum", sum(dataframe("Age")) over(windowSpec))
  newDF.filter(!(newDF("Age") === newDF("sum"))).show

Upvotes: 1

Related Questions