Reputation: 35

show all values in categorical variable

The google search has been difficult for this. I have two categorical variables, age and months, with 7 levels each. for a few levels, say age =7 and month = 7 there is no value and when I use proc sql the intersections that do not have entries do not show, eg:

    age   month value
     1       1    4
     2       1    12
     3       1    5
      ....
     7       1    6
     ...
     1       7    8
      ....
     5       7    44
     6       7    5 
     THIS LINE DOESNT SHOW

what i want

   age   month value
     1       1    4
     2       1    12
     3       1    5
      ....
     7       1    6
     ...
     1       7    8
      ....
     5       7    44
     6       7    5 
     7       7    0

this happens a few times in the data, where tha last groups dont have value so they dont show, but I'd like them to for later purposes

Upvotes: 1

Answers (3)

Reeza

Reputation: 21274

You have a few options available, both seem to work on the premise of creating the master data and then merging it in. Another is to use a PRELOADFMT and FORMATs or CLASSDATA option.

And the last - but possibly the easiest, if you have all months in the data set and all ages, then use the SPARSE option within PROC FREQ. It creates all possible combinations.

proc freq data=have;
   table age*month /out = want SPARSE;
   weight value;
run;

Upvotes: 1

Richard

Reputation: 27508

The group independent crossing of the classification variables requires a distinct selection of each level variable be crossed joined with the others -- this forms a hull that can be left joined to the original data. For the case of age*month having more than one item you need to determine if you want

rows with repeated age and month and original value
rows with distinct age and month with either
- aggregate function to summarize the values, or
- an indication of too many values

data have;
input age   month value;
datalines;
     1       1    4
     2       1    12
     3       1    5
     7       1    6
     1       7    8
     5       7    44
     6       7    5 
     8       8    1
     8       8    11
run;

proc sql;
  create table want1(label="Original class combos including duplicates and zeros for absent cross joins")
  as
  select 
    allAges.age
  , allMonths.month
  , coalesce(have.value,0) as value
  from
    (select distinct age from have) as allAges
  cross join
    (select distinct month from have) as allMonths
  left join
    have
  on 
    have.age = allAges.age and have.month = allMonths.month
  order by 
    allMonths.month, allAges.age
  ;
quit;

And a slight variation that marks duplicated class crossings

proc format;
  value S_V_V .t = 'Too many source values'; /* single valued value */
quit;

proc sql;
  create table want2(label="Distinct class combos allowing only one contributor to value, or defaulting to zero when none")
  as
  select distinct
    allAges.age
  , allMonths.month
  , case 
      when count(*) = 1 then coalesce(have.value,0)
      else .t
    end as value format=S_V_V.
  , count(*) as dup_check
  from
    (select distinct age from have) as allAges
  cross join
    (select distinct month from have) as allMonths
  left join
    have
  on 
    have.age = allAges.age and have.month = allMonths.month
  group by
    allMonths.month, allAges.age
  order by 
    allMonths.month, allAges.age
  ;
quit;

This type of processing can also be done in Proc TABULATE using the CLASSDATA= option.

Upvotes: 0

DomPazz

Reputation: 12465

First some sample data:

data test;
do age=1 to 7;
    do month=1 to 12;
        value = ceil(10*ranuni(1));
        if ranuni(1) < .9 then
            output;
    end;
end;
run;

This leaves a few holes, notably, (1,1).

I would use a series of SQL statements to get the levels, cross join those, and then left join the values on, doing a coalesce to put 0 when missing.

proc sql;
create table ages as 
select distinct age from test;

create table months as 
select distinct month from test;

create table want as
select a.age,
       a.month,
       coalesce(b.value,0) as value
    from (
            select age, month from ages, months
         ) as a
      left join
         test as b
      on a.age = b.age
       and a.month = b.month;
quit;

Upvotes: 0

show all values in categorical variable

Answers (3)

Related Questions