Reputation: 4512
I am using this query in a MySQL database
select *,
UNIX_TIMESTAMP(CONVERT_TZ(
SUBSTRING(input_date, 1, 19),
SUBSTRING(input_date, 20),
'SYSTEM'
)) as timestamp
from my_table
which is used to convert a sample input_date
timestamp with UTC offset (eg 2018-12-15T13:48:16-08:00
) to epoch time.
I now need to do the same thing using PySpark and accessing that table through a JDBC connection, but when I try I get the following error
Py4JJavaError: An error occurred while calling o62.sql.
: org.apache.spark.sql.AnalysisException: Undefined function: 'CONVERT_TZ'. This function is neither a registered temporary function nor a permanent function registered in the database ...
What am I doing wrong? Is there a better way to do it in PySpark?
Thanks
Upvotes: 0
Views: 401
Reputation: 7419
You can use this function to connect to the MySQL db:
def connect_to_sql(
spark, jdbc_hostname, jdbc_port, database, data_table, username, password
):
jdbc_url = "jdbc:mysql://{0}:{1}/{2}".format(jdbc_hostname, jdbc_port, database)
connection_details = {
"user": username,
"password": password,
"driver": "com.mysql.cj.jdbc.Driver",
}
df = spark.read.jdbc(url=jdbc_url, table=data_table, properties=connection_details)
return df
Regarding timezone conversion, this question will help you :
How to convert a Date String from UTC to Specific TimeZone in HIVE?
Upvotes: 1