Reputation: 981
How would you explain to someone how much a "byte" is in the LENGTH
statement? I always thought 1 byte equaled 1 character or 1 number, but that doesn't seem to be the case. Also, why is the syntax for it different than the syntax for the FORMAT
statement? i.e.:
/*FORMAT Statement Syntax*/
FORMAT variable_name $8.;
/*LENGTH Statement*/
LENGTH variable_name $ 8
Upvotes: 2
Views: 1469
Reputation: 51566
The syntax is different because they do different things. The LENGTH
statement defines the type of the variable and how much room it takes to store the variable in the dataset. The FORMAT
statement defines which FORMAT you want to attach to the variable so that SAS knows how to transform the variable when writing the value out to the log or output window.
The $ in the length statement means you are defining a character variable. The $ in a format statement is just part of the name of the format that you are attaching to the variable. Formats that can be used with character variables start with a $
and numeric formats do not. Formats need to have a period so that SAS can distinguish them from variable names. But the lengths used in a LENGTH statement are integers and so periods are not needed (although SAS will ignore them if you add them after the integer value).
I see a lot of confusion in SAS code where the FORMAT statement is used as if it is intended to define variables. This only works because SAS will guess at how to define a variable the first time it appears in the data step. So it will use the details of the format you are attaching to guess at what type of variable you mean. So if you first reference X in an assignment statement x=2+3
then SAS will guess that X should numeric and give it the default length of 8. But if the first place it sees X is in a format statement like format x $10.
then it will guess that you wanted to make X a character variable with length 10 to match the width of the format.
As to how characters are represented and stored it depends on what encoding you are using. If you are only using simple 7-bit ASCII codes then there is a 1-1 relationship between characters and how many bytes it takes to store them. But if you are using UTF-8 it can take up to 4 bytes to store a single character.
For numeric variables SAS uses the IEEE 64 bit format so the relationship between the LENGTH used to store the variable and the width of a format used to display it is much more complex. It is best to just define all numeric variables as length 8. SAS will allow you to define numeric variables with length less than 8 bytes, but that just means it throws away those extra bits of precision when writing the values to the SAS dataset. When storing integers you can do this without loss of precision as long as there are enough bits left to store the largest number you expect. For floating point values you will lose precision.
Upvotes: 4