sjmeinholz
sjmeinholz

Reputation: 25

SAS simple row count

I have a large data set which has all states included, currently looking at NE. I am looking at each state individually and I want to count the number of rows that are included in each state. I am currently using:

record_num = _N_;

BUT This gives me the row count of the first row which NE is included. Example, row 49752 is the first record_num:

49752
49753
49754
49756
49757
49758

I want:

1
2
3
4
5
6

I want this to be equal to 1 instead. I could do:

record_num=_N_;
num = record_num-49751;

But I don't want to alter this code every time I change the state which would change the record_num.

Upvotes: 1

Views: 10437

Answers (2)

Joe
Joe

Reputation: 63424

SAS by-group processing is what you need to be using.

For example:

data want;
  set have;
  by state;
  if first.state then counter=0;
  counter+1;
run;

What that does is when STATE changes, the last record before the change has a variable LAST.STATE set to 1 (otherwise has a 0), and the next record (The first for the new value of state) has the variable FIRST.STATE set to 1 (otherwise 0).

So we set counter to 0 each time a new state is encountered, and increment it (var+1; increments the variable and retains it across rows).

If you want to only have the actual count, as vasja notes in comments, then add if last.state; before the run (which will only give you one row per state, the last one).

Upvotes: 1

Robert Penridge
Robert Penridge

Reputation: 8513

Some more alternatives to @Joe's answer. You can use proc means to calculate the number of rows for each state. I've used the sashelp.zipcode dataset as an example. The nway option tells it to summarize by all of the variables in the class statement. The n= option on the output statement tells it just to calculate the number of observations for the class variables (you could also choose other aggregate functions such as mean, sum, etc.).

proc means data=sashelp.zipcode nway missing noprint;
  class statecode;
  output out=want(keep=statecode _freq_) n=;
quit;

There's also the SQL approach which is my personal favorite as it's a widely understood syntax that can be used in multiple programming languages:

proc sql noprint;
  create table want as
  select statecode,
         count(*) as num_records
  from sashelp.zipcode
  group by 1
  order by 1
  ;
quit;

Upvotes: 2

Related Questions