Reputation: 105
I would like to achieve a similar outcome as pandas' info() in python. I want to see the number of elements for each column (missing / non-missing), like with proc contents, just for all of the variables. I'm sure there is a simple way, I just can't find it:) Could anyone help me with this?
Example for info( ):
df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 int_col 5 non-null int64
1 text_col 5 non-null object
2 float_col 5 non-null float64
Upvotes: 0
Views: 1238
Reputation: 51566
I don't know what INFO() function produces (you should include an example in your question) but it sounds like you want the number of distinct values (LEVELS) for the variables. PROC FREQ can do that.
proc freq nlevels data=sashelp.cars;
tables _all_ / noprint;
run;
Result
The FREQ Procedure
Number of Variable Levels
Missing Nonmissing
Variable Label Levels Levels Levels
--------------------------------------------------------------------
Make 38 0 38
Model 425 0 425
Type 6 0 6
Origin 3 0 3
DriveTrain 3 0 3
MSRP 410 0 410
Invoice 425 0 425
EngineSize Engine Size (L) 43 0 43
Cylinders 8 1 7
Horsepower 110 0 110
MPG_City MPG (City) 28 0 28
MPG_Highway MPG (Highway) 33 0 33
Weight Weight (LBS) 348 0 348
Wheelbase Wheelbase (IN) 40 0 40
Length Length (IN) 67 0 67
If you want to count missing/non-missing values then make a pair of special formats and use proc freq. SAS only has two types of variables, numbers and fixed length character strings. So make two formats, one for each type.
There is a worked example on SAS Communities : https://communities.sas.com/t5/SAS-Programming/Counting-missing-and-non-missing-obs/m-p/740272
*set input/output dataset names;
%let INPUT_DSN = sashelp.cars;
%let OUTPUT_DSN = want;
* create formats for missing;
proc format;
value $missfmt ' ' = "Missing" other = "Not_Missing";
value nmissfmt low-high ="Not_Missing" other="Missing";
run;
* turn off output and capture the one way freq table TEMP dataset ;
ods select none;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
* turn outputs back on ;
ods select all;
* Collapse to one observation per variable ;
data &OUTPUT_DSN ;
length name $32 missing not_missing total 8 ;
set temp;
by table notsorted ;
if first.table then call missing(of missing not_missing);
name = substr(table,7);
if vvaluex(name)='Missing' then missing=frequency;
else not_missing=frequency;
retain missing not_missing;
if last.table then do;
missing=sum(0,missing);
not_missing=sum(0,not_missing);
total=sum(missing,not_missing);
percent = divide(missing,total);
output;
end;
keep name missing not_missing total percent;
run;
Resulting data:
Upvotes: 2