Reputation: 5
Add column in expr when subtracting the timestamp with column value.
df.withColumn("out", expr("timestamp - interval hour_part hours"))
Input:
id,hour_part,timestamp
1,1,2019-01-01 13:00:00
1,2,2019-01-01 14:00:00
1,2,2019-01-01 15:00:00
2,3,2019-01-01 17:00:00
2,4,2019-01-01 18:00:00
Output:
id,hour_part,timestamp,out
1,1,2019-01-01 13:00:00,2019-01-01 12:00:00
1,2,2019-01-01 14:00:00,2019-01-01 12:00:00
1,2,2019-01-01 15:00:00,2019-01-01 13:00:00
2,3,2019-01-01 17:00:00,2019-01-01 14:00:00
2,4,2019-01-01 18:00:00,2019-01-01 14:00:00
Error: org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'hours' expecting (line 1, pos 28)
Upvotes: 0
Views: 645
Reputation: 2072
alternatively you can use this below approach,
import org.apache.spark.sql.functions._
val df=Seq(("1","1","2019-01-01 13:00:00"),
("1","2","2019-01-01 14:00:00"),
("1","2","2019-01-01 15:00:00"),
("2","3","2019-01-01 17:00:00"),
("2","4","2019-01-01 18:00:00")).toDF("id","hour_part","timestamp")
df.withColumn("out", from_unixtime((unix_timestamp($"timestamp") - $"hour_part" * 60 * 60))).show()
/*
+---+---------+-------------------+-------------------+
| id|hour_part| timestamp| out|
+---+---------+-------------------+-------------------+
| 1| 1|2019-01-01 13:00:00|2019-01-01 12:00:00|
| 1| 2|2019-01-01 14:00:00|2019-01-01 12:00:00|
| 1| 2|2019-01-01 15:00:00|2019-01-01 13:00:00|
| 2| 3|2019-01-01 17:00:00|2019-01-01 14:00:00|
| 2| 4|2019-01-01 18:00:00|2019-01-01 14:00:00|
+---+---------+-------------------+-------------------+
*/
// using expr()
df.withColumn("out", expr(""" from_unixtime((unix_timestamp(timestamp) - hour_part * 60 * 60))""")).show()
/*
+---+---------+-------------------+-------------------+
| id|hour_part| timestamp| out|
+---+---------+-------------------+-------------------+
| 1| 1|2019-01-01 13:00:00|2019-01-01 12:00:00|
| 1| 2|2019-01-01 14:00:00|2019-01-01 12:00:00|
| 1| 2|2019-01-01 15:00:00|2019-01-01 13:00:00|
| 2| 3|2019-01-01 17:00:00|2019-01-01 14:00:00|
| 2| 4|2019-01-01 18:00:00|2019-01-01 14:00:00|
+---+---------+-------------------+-------------------+
*/
Upvotes: 1