inspiredd
inspiredd

Reputation: 217

Return value from prior year Spark sql

I have the following table:

YearMonth value
202101 5
202102 7
.... 10
202201 4

I need to create third column with the value from the previous year. When I have 202201 there should be valuePriorYear equals 5 (value from 202101). I've tried with the lag and over partition but I'm able select only value from the previous month

Upvotes: 0

Views: 55

Answers (1)

vilalabinot
vilalabinot

Reputation: 1611

Let's assume you have this dataset:

val df = spark.sparkContext.parallelize(Seq(
  (202101, 5),
  (202102, 7),
  (202004, 8),
  (202104, 6),
  (202201, 0)
)).toDF("YearMonth", "value")

You can add to YearMonth the value of 100 (since you are working with yearweeks) and rename your value (I don't really know your case, but for the sake of clarity).

val df2 = df
  .withColumn("YearMonth", col("YearMonth").plus(100))
  .withColumnRenamed("value", "value2")

Then finally join back:

val df3 = df.join(df2, Seq("YearMonth"), "left")

Final result:

+---------+-----+------+
|YearMonth|value|value2|
+---------+-----+------+
|202004   |8    |null  |
|202101   |5    |null  |
|202102   |7    |null  |
|202104   |6    |8     |
|202201   |0    |5     |
+---------+-----+------+

I hope this is useful, good luck!

Upvotes: 1

Related Questions