Reputation: 589
I am using SCOTT schema in oracle and want to delete the duplicate values like this...
SELECT EMPNO, JOB, SAL
FROM ( SELECT EMPNO,
JOB,
SAL,
ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
FROM EMP_TEST
ORDER BY EMPNO)
WHERE R_ID = 1
Right now I want to perform this in SAS ... and following code performs fine....
PROC SQL;
connect to oracle (PATH="database_name" USER=SCOTT PASSWORD=tiger);
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
FROM ( SELECT EMPNO,
JOB,
SAL,
ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
FROM SCOTT.EMP_TEST
ORDER BY EMPNO)
WHERE R_ID = 1
);
DISCONNECT FROM oracle;
QUIT;
But I got error when I have to perform this type of query in dataset level.
Suppose my EMP_TEST is not oracle table, its actually SAS dataset in a particular path and I want to create another dataset named EMP_DEDUP from EMP_TEST dataset after remove the duplicate values. This is my approach.
PROC SQL;
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
FROM ( SELECT EMPNO,
JOB,
SAL,
ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
FROM path.EMP_TEST
ORDER BY EMPNO)
WHERE R_ID = 1
;
QUIT;
Here I found syntax error..... might be SAS does not support ROW_NUMBER () function. Any possible solution.. can anybody suggest...
Upvotes: 3
Views: 9090
Reputation: 63434
Sounds like you want the lowest SAL for each EMPNO, is that correct?
proc sort data=path.emp_test;
by empno sal;
run;
data work.emp_dedupl;
set path.emp_test;
by empno sal;
if first.empno;
run;
SAS doesn't support PARTITION OVER, so it wouldn't really allow you to do this in SQL.
Upvotes: 3
Reputation: 4538
If analytical functions are not supported then you can write something like this
PROC SQL;
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
FROM path.EMP_TEST
WHERE rowid in (select min(rowid)
from path.EMP_TEST
group by empno)
);
QUIT;
Upvotes: 1