Reputation: 113
I am writing a SAS query to QA some data views. Part of the QA is determining what percentage of the values are populated. Unfortunately SAS is counting empty character cells as populated rather than NULL or having no data. For example, an ID field has some blank cells and I run a COUNT() function, I get the same result as though I ran a COUNT(*) If I run a CASE WHEN statement to exclude "" values I get the correct results, but needing to do that for every single text field in the SAS query seems like overkill and that I'm not aware of some function or some way to preprocess the data so that the COUNT function won't count empty cells.
Some example data that provide the idea is:
data QA_Test;
Input Name $ ID_Number;
Robert 1AY
Shirley ""
Tammy XB3
Upvotes: 1
Views: 764
Reputation: 63424
You can use compress
also to remove them, inside the count
, if it's correct to leave them in but you don't want them to count:
proc sql;
select count(compress(id,'"'))
...
;
quit;
Upvotes: 0
Reputation: 51566
Use the DSD option when reading values that have quotes around them from a text file.
data QA_Test;
infile cards dsd dlm=' ' truncover;
input Name $ ID_Number $;
cards;
Robert 1AY
Shirley ""
Tammy XB3
;
Now ID_NUMBER will not contain the quotes.
Or use a period to represent the missing values in your text file.
data QA_Test;
input Name $ ID_Number $;
cards;
Robert 1AY
Shirley .
Tammy XB3
;
If you already have those '""'
strings in your data and you don't want to count them then use a different method of counting.
sum(not (id_number in (' ','""')))
Upvotes: 1
Reputation: 21274
You need to provide a reproducible example. Please follow the instructions here or use the hex example I previously showed.
So somewhat fixing the non-working code you posted, I did this:
data QA_Test;
Input Name $ ID_Number $;
cards;
Robert 1AY
Shirley ""
Tammy XB3
;;;;
run;
proc sql;
select count(*) as total_count, count(Id_number) as n_id
from QA_TEST;
quit;
Results:
total_count n_id
3 3
But this creates a data set with actual quotes in the cell, I'm assuming that isn't the case in your actual data? So if I read it in as missing:
data QA_Test;
infile cards truncover;
Input Name $ ID_Number $;
cards;
Robert 1AY
Shirley
Tammy XB3
;;;;
run;
proc sql;
select count(*) as total_count, count(Id_number) as n_id
from QA_TEST;
quit;
Results in:
total_count n_id
3 2
So I think SAS is right, your data quality tests are correct and your data has data quality issues that need to be resolved - specifically in this case, fields that has likely tab or invisible characters in the data.
You can test this with the following and post your output here or on communities.sas.com.
proc freq data=qa_test;
table id_number / out=check missing;
format Id_number $hex.;
run;
Upvotes: 0