Reputation: 237
I am trying to read maximum id
value in my table by using
_select_sql = f"(SELECT MAX(id) FROM {tablename})"
highest_id = spark.read.jdbc(url, table=_select_sql, properties=properties)
After executing this I am getting :
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'WHERE'
When I try to read all data using highest_id = spark.read.jdbc(url, table=tablename, properties=properties)
evrything is fine.
Do you know where could be mistake?
Edit:
After changing to
_select_sql = f"(SELECT MAX(id) FROM {tablename}"
I am getting:
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '0'.
Upvotes: 3
Views: 4921
Reputation: 26
The second part (after the edit) of the issue is from not closing the parenthesis at the end of your _select_sql query:
_select_sql = f"(SELECT MAX(id) FROM {tablename})"
The first part (original post) is that some JDBC drivers/source databases require a table name be added to the query in order to work. I know this is the case for SQL Server, which your query is executing against, but, for example, Oracle does not require it. eg. for SQL Server:
_select_sql = f"(SELECT MAX(id) FROM {tablename}) PUSH_DOWN_TABLE"
The somewhat esoteric "Incorrect syntax near the keyword 'WHERE'" exception message is the result of not including the table name after the query in parenthesis.
This specific usage of the "table" option to execute a query is a method for performing a "push down" query. Using the "table" option in the spark.read.jdbc executes the query as kind of a table in the source database and only returns the result of your aggregate function "MAX". This reduces the amount of data that has to be transferred from the source database by performing the aggregation there and only sending the result.
Upvotes: 0
Reputation: 121
Spark internally creates a SQL statement like
SELECT * FROM <table>
So, if you want a statement as table parameter it has to look like this (just adding a name to the inline view):
_select_sql = f"(SELECT MAX(id) FROM {tablename}) tab"
and sparks cretated SQL will look like this and do the job:
SELECT * FROM (SELECT MAX(id) FROM mytable) tab
Upvotes: 1
Reputation: 237
I solved this by using
_select_sql = f"(SELECT TOP(1) id FROM {tablename} ORDER BY id DESC)"
Thanks anyway to Alex Ott, It works only in way that he gave. Don t really know why because of some sources tells this should work in similar way:
df = spark.read.jdbc(url = url, table = query, properties = properties)
Upvotes: 0