chin99
chin99

Reputation: 5

Add Column in expr in spark

Add column in expr when subtracting the timestamp with column value.

df.withColumn("out", expr("timestamp - interval hour_part hours"))

Input:

id,hour_part,timestamp
1,1,2019-01-01 13:00:00
1,2,2019-01-01 14:00:00
1,2,2019-01-01 15:00:00
2,3,2019-01-01 17:00:00
2,4,2019-01-01 18:00:00

Output:

id,hour_part,timestamp,out
1,1,2019-01-01 13:00:00,2019-01-01 12:00:00
1,2,2019-01-01 14:00:00,2019-01-01 12:00:00
1,2,2019-01-01 15:00:00,2019-01-01 13:00:00
2,3,2019-01-01 17:00:00,2019-01-01 14:00:00
2,4,2019-01-01 18:00:00,2019-01-01 14:00:00

Error: org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'hours' expecting (line 1, pos 28)

Upvotes: 0

Views: 645

Answers (1)

sathya
sathya

Reputation: 2072

alternatively you can use this below approach,

import org.apache.spark.sql.functions._

val df=Seq(("1","1","2019-01-01 13:00:00"),
("1","2","2019-01-01 14:00:00"),
("1","2","2019-01-01 15:00:00"),
("2","3","2019-01-01 17:00:00"),
("2","4","2019-01-01 18:00:00")).toDF("id","hour_part","timestamp")

df.withColumn("out", from_unixtime((unix_timestamp($"timestamp") - $"hour_part" * 60 * 60))).show()

/*
+---+---------+-------------------+-------------------+
| id|hour_part|          timestamp|                out|
+---+---------+-------------------+-------------------+
|  1|        1|2019-01-01 13:00:00|2019-01-01 12:00:00|
|  1|        2|2019-01-01 14:00:00|2019-01-01 12:00:00|
|  1|        2|2019-01-01 15:00:00|2019-01-01 13:00:00|
|  2|        3|2019-01-01 17:00:00|2019-01-01 14:00:00|
|  2|        4|2019-01-01 18:00:00|2019-01-01 14:00:00|
+---+---------+-------------------+-------------------+
*/
// using expr()
df.withColumn("out", expr(""" from_unixtime((unix_timestamp(timestamp) - hour_part * 60 * 60))""")).show()
/*
+---+---------+-------------------+-------------------+
| id|hour_part|          timestamp|                out|
+---+---------+-------------------+-------------------+
|  1|        1|2019-01-01 13:00:00|2019-01-01 12:00:00|
|  1|        2|2019-01-01 14:00:00|2019-01-01 12:00:00|
|  1|        2|2019-01-01 15:00:00|2019-01-01 13:00:00|
|  2|        3|2019-01-01 17:00:00|2019-01-01 14:00:00|
|  2|        4|2019-01-01 18:00:00|2019-01-01 14:00:00|
+---+---------+-------------------+-------------------+
*/

Upvotes: 1

Related Questions