Nate
Nate

Reputation: 420

Differentiating between missing and total in output of proc means?

I've got something like the following:

proc means data = ... missing;
class 1 2 3 4 5;
var a b;
output sum=;
run;

This does what I want it to do, except for the fact that it is very difficult to differentiate between a missing value that represents a total, and a missing value that represents a missing value. For example, the following would appear in my output:

1 2 3 4 5 type sumA sumB
. . . . . 0 num num
. . . . . 1 num num

Ways I can think of handling this:

1) Change missings to a cardinal value prior to proc means. This is definitely doable...but not exactly clean.

2) Format the missings to something else prior, and then use preloadfmt? This is a bit of a pain...I'd really rather not.

3) Somehow use the proc means-generated variable type to determine whether the missing is a missing or a total

4) Other??

I feel like this is clearly a common enough problem that there must be a clean, easy way, and I just don't know what it is.

Upvotes: 1

Views: 187

Answers (2)

Nate
Nate

Reputation: 420

Joe's strategy, modified slightly for my exact problem, because it may be useful to somebody at some point in the future.

data want;
set have;
array classvars a b c d e;
do _t = 1 to dim(classvars);
  if char(_type_,_t) = 1 and (strip(classvars[_t] = "") or strip(classvars[_t]) = ".") then classvars[_t] = "TOTAL";
end;
run;

The rationale for the changes is as follows:

1) I'm working with (mostly) character variables, not numeric.

2) I'm not interested in whether a row has any missing or not, as those are very frequent, and I want to keep them. Instead, I just want the output to differentiate between the missings and the totals, which I have accomplished by renaming the instances of non-missing to something that indicates total.

Upvotes: 1

Joe
Joe

Reputation: 63424

Option 3, for sure . Type is simply a binary number with 1 for each class variable, in order, that is included in the current row and 0 for each one that is missing. You can use the CHARTYPE option to ask for it to be given explicitly as a string ('01101110' etc.), or work with math if that's more your style.

How exactly you use this depends on what you're trying to accomplish. Rows that have a missing value on them will have a type that suggests a class variable should exist, but doesn't. So for example:

data want;
set have; *post-proc means assuming used CHARTYPE option;
array classvars a b c d e;  *whatever they are;
hasmissing=0;
do _t = 1 to dim(classvars);
  if char(_type_,_t) = 1 and classvars[_t] = . then hasmissing=1;
end;
*or;
if cmiss(of classvars[*]) = countc(_type_,'0') then hasmissing=0;
else hasmissing=1; *number of 0s = number of missings = total row, otherwise not;
run;

That's a brute force application, of course. You may also be able to identify it based on the number of missings, if you have a small number of types requested. For example, let's say you have 3 class variables (so 0 to 7 values for type), and you only asked for the 3 way combination (7, '111') and the 3 two way combination 'totals' (6,5,3, ie, '110','101','011'). Then:

data want;
set have;
if (_type_=7 and cmiss(of a b c) = 0) or (cmiss(of a b c) = 1) then ... ; *either base row or total row, no missings;
else ... ; *has at least one missing;
run;

Depending on your data, NMISS may also work. That checks to see if the number of missings is appropriate for the type of data.

Upvotes: 1

Related Questions