Kevin
Kevin

Reputation: 301

Adding new column of previous date for account in table

Suppose you have a table with snapshots of accounts from the past 4 years, 1 row per date/account pair. (Note accounts don't need to have a row for every day).

For simplicity suppose there are 3 columns: acct_number, date and balance. How would you add 2 more columns to each row that are the date/balance for that account for the previous date recorded in the table?

It seems like sorting by acct_number and date then "offsetting by one" and joining to the original table is a promising approach but I don't know how you'd implement this.

Upvotes: 1

Views: 139

Answers (1)

Kevin
Kevin

Reputation: 301

The solution is to use Windows functionality with a lag of 1.

import org.apache.spark.sql.expressions.Window

val df = load_my_df()

val frame = Window.partitionBy("acct_number").orderBy("date")
val df_new = df.withColumn("prev_balance", lag("balance", 1, null).over(frame))


Great examples here: https://sparkbyexamples.com/spark/spark-sql-window-functions/

Documentation: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

Upvotes: 2

Related Questions