huadaixia
huadaixia

Reputation: 109

Is there a way to return the specific row number on SAS?

Here is the part of dataset:

   Obs Buffer
     ...
    75 14
    76 13 
    77 64 
    78 38.1% 
    79 29.2% 
    80 69.2% 
    81 33 
    82 5-12
     ... 

I only need the data containing "%" and the two rows ahead of this. For example, in this case I want to pull out "13" "64" "38.1%" "29.2%" and "69.2%".

Is there a way I can achieve this?

Upvotes: 1

Views: 739

Answers (3)

huadaixia
huadaixia

Reputation: 109

By following kungfujam's thought, I get the code below and it works.

data adjust;
set source;
oneBefore = lag1(Buffer);
twoBefore = lag2(Buffer);
threeBefore = lag3(Buffer);
fourBefore = lag4(Buffer);
if index(buffer,'%')^=0 and index(onebefore,'%')^=0 and index(twobefore,'%')^=0then do;
        x = fourBefore;
        output;
        x = threeBefore;
        output;
        x = twoBefore;
        output;
        x = oneBefore;
        output;
        x=buffer;
       output;
    end;
  keep x;
run;

Upvotes: 0

James Owers
James Owers

Reputation: 8335

To answer your question: the _N_ variable will return you the number of times that the data step has looped past the data statement. (http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000695104.htm)

However, to solve your problem use lag() (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm) and a contains statement e.g.

data justBuffers;
    set yourDS;
    twoBefore = lag2(Buffer);
    oneBefore = lag1(Buffer);
    if Buffer ? '%' then do;
        if not missing(twoBefore) then do;
            x = twoBefore;
            output;
        end;
        if not missing(oneBefore) then do;
            x = oneBefore;
            output;
        end;
        x = Buffer;
        output;
        call missing(oneBefore, twoBefore);
    end;
    keep x;
run;

I've not tested the code so watch out! I'm sure you could make it smoother.

Upvotes: 0

Joe
Joe

Reputation: 63424

I like using point for this sort of thing. _N_ is reliable as a row counter as long as you're not doing anything funny with the data step loop.

data have;
length buffer $50;
input obs buffer $;
datalines;
75 14
76 13 
77 64 
78 38.1% 
79 29.2% 
80 69.2% 
81 33 
82 5-12
;;;;
run;
data want;
set have;
pointer=_N_;
if find(buffer,'%') then do;
  output;
  pointer=_N_-1;
  set have point=pointer;
  if not (find(buffer,'%')) then do;
    output;
    pointer=_N_-2;
    set have point=pointer;
    if not (find(buffer,'%')) then output;
  end; 
end;
run;

If you need to restore your order you can sort by pointer afterwards (or obs, if that is a real variable - I assume it is not). If obs is indeed a real variable (or if you make it real with a view), there is an interesting way in SQL to do this:

proc sql;
create table want as
    select H.* from have H 
        left join have V on H.obs=V.obs-1
        left join have A on H.obs=A.obs-2
        where 
            (find(H.buffer,'%'))
            or
            (find(V.buffer,'%'))
            or
            (find(A.buffer,'%'))
        order by H.obs
        ;
quit;

How to make obs real without a data pass:

data have_vw/view=have_vw;
set have;
obs=_n_;
run;

And then use have_vw instead of have in the SQL query (in all three spots).

Upvotes: 1

Related Questions