Reputation: 217
I have the following table:
YearMonth | value |
---|---|
202101 | 5 |
202102 | 7 |
.... | 10 |
202201 | 4 |
I need to create third column with the value from the previous year. When I have 202201 there should be valuePriorYear equals 5 (value from 202101). I've tried with the lag and over partition but I'm able select only value from the previous month
Upvotes: 0
Views: 55
Reputation: 1611
Let's assume you have this dataset:
val df = spark.sparkContext.parallelize(Seq(
(202101, 5),
(202102, 7),
(202004, 8),
(202104, 6),
(202201, 0)
)).toDF("YearMonth", "value")
You can add to YearMonth
the value of 100
(since you are working with yearweeks) and rename your value
(I don't really know your case, but for the sake of clarity).
val df2 = df
.withColumn("YearMonth", col("YearMonth").plus(100))
.withColumnRenamed("value", "value2")
Then finally join back:
val df3 = df.join(df2, Seq("YearMonth"), "left")
Final result:
+---------+-----+------+
|YearMonth|value|value2|
+---------+-----+------+
|202004 |8 |null |
|202101 |5 |null |
|202102 |7 |null |
|202104 |6 |8 |
|202201 |0 |5 |
+---------+-----+------+
I hope this is useful, good luck!
Upvotes: 1