Reputation: 313
I have a blood.txt dataset like this (first 5 obs):
1 Female AB Young 7710 7.4 258
2 Male AB Old 6560 4.7 .
3 Male A Young 5690 7.53 184
4 Male B Old 6680 6.85 .
5 Male A Young . 7.72 187
I used the following program to read it:
data blood_sum;
infile "/path/blood.txt";
input @1 SubjID $
@6 Gender $
@13 BloodType $
@16 AgeGrp $
@22 RBC
@29 WBC
@34 Cholesterol ;
run;
But the last column "Cholesterol" can't display; all values are replaced by "." My log has numerous NOTE errors like this:
NOTE: Invalid data for Cholesterol in line 1 34-37.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 CHAR 1 Female AB Young 7710 7.4 258. 37
ZONE 3222246666624425676623333222323223330
NUMR 1000065D1C501209F5E7077100007E400258D
SubjID=1 Gender=Female BloodType=AB AgeGrp=Young RBC=7710 WBC=7.4 Cholesterol=. _ERROR_=1
Can anyone help?
Upvotes: 1
Views: 736
Reputation: 63434
I'll give a slightly different solution for the problem, which I agree with Bob is caused by the carriage return at the end of the line.
You can control the terminating character for a line (normally, for Windows, CR/LF or '0d'x '0a'x ; for Unix, '0a'x or LF only) with the TERMSTR option on the infile.
http://support.sas.com/kb/14/178.html
data blood_sum;
infile "/path/blood.txt" termstr=CRLF;
input @1 SubjID $
@6 Gender $
@13 BloodType $
@16 AgeGrp $
@22 RBC
@29 WBC
@34 Cholesterol ;
run;
By the way, I find your input method a bit confusing. You're sort of mixing input types here, so you might not always get consistent results. In fact, this probably would've never happened if you had explicitly assigned the formats!
input
@1 subjid $4.
@6 gender $6.
@13 bloodtype $2.
@16 agegrp $5.
@22 rbc best8.
@29 wbc best4.
@34 Cholesterol 3.
;
Then Choleserol would be read from 34-36 and you would've never had SAS trying to include 37 in the variable.
Upvotes: 0
Reputation: 9618
I'm going to guess that you are running this on a UNIX system but the file you are reading (blood.txt) was created on a Windows system and copied to your system in binary mode.
If you look at the log, you should notice there is a "dot" after the last value in your input line (in column 37). The ZONE and NUMR parts of the display reveal the hex code for that position, in this case '0D', which is a carriage return character. If you open the file with a UNIX editor (like vi), you will see those characters represented as ^M
at the end of each line.
You can either download a fresh copy from where ever you received it (making sure to transfer the file in TEXT mode) or you can convert your copy to a UNIX text file. To convert, you can use the dos2unix
command like this:
dos2unix /path/blood.txt /path/blood.txt
Note that if you use the same name it will overwrite the original file. Of course, I assume you have permission to do that.
In case you cannot convert the file for some reason, you can use a pipe to do the conversion. In other words, use this FILENAME statement and change your INFILE statement to read from the filename:
filename mydata pipe "tr -d '\r' < /path/blood.txt";
data blood_sum;
infile mydata truncover;
input @1 SubjID $
@6 Gender $
@13 BloodType $
@16 AgeGrp $
@22 RBC
@29 WBC
@34 Cholesterol ;
run;
I added the truncover
option although you may not need it. Read more about it in the docs if interested.
By the way, this is a very common error and happens to everyone at least once. Welcome to StackOverflow.
Upvotes: 2