Reputation: 1745
I am trying to create some temporary tables to grab most recent partitions and then inner join back on to main table to filter out older partitions. To do this, I would much rather prefer to use temporary tables rather than tons of nested subqueries. I am using tools such as SAS via pass-through proc sql to connect to the db and query the data.
Example:
proc sql;
CONNECT TO ODBC (USER=XXXXX PASSWORD="XXXXX" DSN="XXXXX");
CREATE TABLE max_partition as SELECT * FROM CONNECTION TO ODBC
(
CREATE TEMPORARY TABLE tmp1 as select MAX(partitiondate) as latestpartitiondate, id FROM tbl1 GROUP BY id;
select * from tmp1
);
quit;
I get errors around missing EOF by 'select' if I do not include the semicolon and I get an error if I do use the semicolon (as I have shown in the above example).
CLI prepare error: [Hortonworks][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message
from server: Error while compiling statement: FAILED: ParseException line 1:148 cannot recognize input near 'id' ';' ')'
I am trying to use temp tables in various tools such as Proc sql in SAS as well as pyodbc in Python. I still receive the same errors. I simply want to be able to generate a table of max partitions for that particular ID and then select from that temp table altogether in a single pass through statement.
Upvotes: 1
Views: 1101
Reputation: 38290
Some drivers may not support multiple sql statements in single session.
As a workaround, use permanent table loaded in separate session (insert overwrite) and reused in other sessions. Temp teble does not give you any improvement comparing with permanent one except it gets automatically dropped at the end and it can not be reused in different sessions.
Also WITH (CTE) or simple subquery in the FROM works faster than temp table if table is not reused. WITH can be materialized and reused multiple times in the same query.
Upvotes: 0
Reputation: 51566
The normal way is to run the prerequisite statements first and then run the query.
EXECUTE BY ODBC
(
CREATE TEMPORARY TABLE tmp1 as
select MAX(partitiondate) as latestpartitiondate
, id
FROM tbl1
GROUP BY id
);
CREATE TABLE max_partition as SELECT * FROM CONNECTION TO ODBC
(
select * from tmp1
);
Does that not work with HIVE?
Upvotes: 3
Reputation: 63424
In SAS proc sql, you can't do what you're doing - but you also don't need to, unless Hive has something very different from any other ODBC connection.
proc sql;
CONNECT TO ODBC (USER=XXXXX PASSWORD="XXXXX" DSN="XXXXX");
CREATE TABLE max_partition as SELECT * FROM CONNECTION TO ODBC
(
select MAX(partitiondate) as latestpartitiondate, id FROM tbl1 GROUP BY id;
);
quit;
If you do need to create the temporary table, you'd have to do it in an execute ( ) by connection to odbc
, which should be possible (I've done something similar with SQL Server, so as long as SAS and Hive implement ODBC connections in the same way it should be okay) and then in a following statement do the select
from it.
Upvotes: 2