Reputation: 53
I have one column Start_Time with a timestamp, and one column Time_Zone_Offset, an integer. How can I add the Time_Zone_Offset to Start_Time as a number of hours?
Example MyTable:
id Start_Time Time_Zone_Offset
1 2020-01-12 00:00:00 1
2 2020-01-12 00:00:00 2
Desired output:
id Local_Start_Time
1 2020-01-12 01:00:00
2 2020-01-12 02:00:00
Attempt:
SELECT id, Start_time + INTERVAL time_zone_offset HOURS AS Local_Start_Time
FROM MyTable
This doesn't seem to work, and I can't use from_utc_timestamp
as I don't have the actual timezone details, just the time-zone offset at the time being considered.
Upvotes: 3
Views: 13822
Reputation: 12243
For anyone else coming to this question and using Spark SQL via Databricks, the dateadd
function works in the same way as most other SQL languages:
select dateadd(microsecond,30,'2022-11-04') as microsecond
,dateadd(millisecond,30,'2022-11-04') as millisecond
,dateadd(second ,30,'2022-11-04') as second
,dateadd(minute ,30,'2022-11-04') as minute
,dateadd(hour ,30,'2022-11-04') as hour
,dateadd(day ,30,'2022-11-04') as day
,dateadd(week ,30,'2022-11-04') as week
,dateadd(month ,30,'2022-11-04') as month
,dateadd(quarter ,30,'2022-11-04') as quarter
,dateadd(year ,30,'2022-11-04') as year
microsecond | millisecond | second | minute | hour | day | week | month | quarter | year |
---|---|---|---|---|---|---|---|---|---|
2022-11-04T00:00:00.000+0000 | 2022-11-04T00:00:00.030+0000 | 2022-11-04T00:00:30.000+0000 | 2022-11-04T00:30:00.000+0000 | 2022-11-05T06:00:00.000+0000 | 2022-12-04T00:00:00.000+0000 | 2023-06-02T00:00:00.000+0000 | 2025-05-04T00:00:00.000+0000 | 2030-05-04T00:00:00.000+0000 | 2052-11-04T00:00:00.000+0000 |
Upvotes: 6
Reputation: 546
Beginning from Spark 3.0, you may use the make_interval(years, months, weeks, days, hours, mins, secs)
function if you want to add intervals using values from other columns.
SELECT
id
, Start_time + make_interval(0, 0, 0, 0, time_zone_offset, 0, 0) AS Local_Start_Time
FROM MyTable
Upvotes: 3
Reputation: 53
Adding an alternative to Benoit's answer using a python UDF:
from pyspark.sql import SQLContext
from datetime import datetime, timedelta
from pyspark.sql.types import TimestampType
# Defining pyspark function to add hours onto datetime
def addHours(my_datetime, hours):
# Accounting for NULL (None in python) values
if (hours is None) or (my_datetime is None):
adjusted_datetime = None
else:
adjusted_datetime = my_datetime + timedelta(hours = hours)
return adjusted_datetime
# Registering the function as a UDF to use in SQL, and defining the output type as 'TimestampType' (this is important, the default is StringType)
sqlContext.udf.register("add_hours", addHours, TimestampType());
followed by:
SELECT id, add_hours(Start_Time, Time_Zone_Offset) AS Local_Start_Time
FROM MyTable
Upvotes: 2
Reputation: 499
(Hope you are using pyspark)
In deed, coudn't make it work with SQL, I manage to get to the result by converting to timestamp, its probably not the best way but it works (i proceeded step by step to make sure the references were working, thought i would need a user defined function, but apparently not)
from pyspark.sql.functions import col,explode,lit
from pyspark.sql import functions as F
df2 = df.withColumn("Start_Time", F.unix_timestamp("Start_Time"))
df2.show()
df3 = df.withColumn("Start_Time", F.unix_timestamp("Start_Time") + df["Time_Zone_Offset"]*60*60)
df3.show()
df4 = df3.withColumn('Start_Time', F.from_unixtime("Start_Time", "YYYY-MM-DD HH:00:00")).show()
Upvotes: 2