Reputation: 293
I am trying to achieve the "last." functionality using Proc SQL instead of data step. Say I have a dataset like below:
Account_Id Dept Salary Emp_Status Projects Rating
111 123 7000 Perm A 5
111 123 7000 Perm B 4
111 123 7000 Perm C 5
222 124 6000 Perm A 5
333 125 7000 Perm B 4
333 125 7000 Perm C 5
I want only one row in my output for each account_id. So, i want the last.account_id. How can I achieve this using proc sql. I tried using max(monotnic()) while grouping on account_id but did not work. Can someone please help. Also, I cannot use or do a sub-query due to some standard project limitations. Is there any other way of doing this in proc sql?
Thanks in advance!
Upvotes: 1
Views: 30027
Reputation: 27518
You correctly state there are no automatic variables in SAS SQL equivalent to first. or last. The data will need to have columns that support a definitive within group ordering that can be utilized for MAX selection and then applied as join criteria. Projects
in your data is a possible candidate:
data have;
input Account_Id Dept Salary Emp_Status $ Projects $ Rating;
datalines;
111 123 7000 Perm A 5
111 123 7000 Perm B 4
111 123 7000 Perm C 5
222 124 6000 Perm A 5
333 125 7000 Perm B 4
333 125 7000 Perm C 5
run;
proc sql;
* standard sql query;
create table want as
select have.*
from have
join (select account_id, max(projects) as max_projects from have group by account_id) as matched
on matched.account_id = have.account_id
and matched.max_projects = have.projects
;
* SAS sql query that does magic auto remerge ;
create table want as
select have.*
from have
group by account_id
having projects = max(projects)
;
I would avoid monotonic()
, especially in SQL. The function is undocumented and not guaranteed to be present or perform equivalently in future releases. Your data really does need context columns for selecting the within group extrema.
Upvotes: 0
Reputation: 9569
The following appears to do what you want for the sample data you have posted, assuming that you care only about the row order of your input dataset rather than the values of any particular variable to determine the order within by-groups:
data have;
input Account_Id Dept Salary Emp_Status $ Projects $ Rating;
cards;
111 123 7000 Perm A 5
111 123 7000 Perm B 4
111 123 7000 Perm C 5
222 124 6000 Perm A 5
333 125 7000 Perm B 4
333 125 7000 Perm C 5
;
run;
proc sql;
create table want as
select *, monotonic() as row_id from have
group by account_id
having row_id = max(row_id);
quit;
This seems quite similar to what you say you've already tried, so if it doesn't work, please provide some sample input data that reproduce the problem.
In general I would advise against using monotonic()
in production code as it is undocumented and can cause unexpected results in more complex queries. When working with sql you should use a variable to define your row order.
Upvotes: 1