user190080
user190080

Reputation: 535

Formating character 'decimals' (comma delimiter) AND character 'integers' to numeric 'decimals' (point delimiter)

This is somewhat related to my other question recently.

Setup I am reading in character variables of the sort 1 or 2,0 or 10,0 or 2,5. I want to convert them to numerics using a decimal point instead of a comma. So ideally I would like to get the following result:

1    -> 1
2,0  -> 2
10,0 -> 10
2,5  -> 2.5

My code

data _null_;
    test='5,0';
    result=input(test_point,comma10.1);
    put 'this should be:' result;
run; 

does this for all character variables which are of the type 'xy,z' but fails for 'xy' with no comma separation at all. Here I would get

xy  ->  x,y

I was thinking to add an if/else to check whether the character string has length of 1 or bigger. So something like

data _null_;
    test='5';
     if length(test)=1 then result=input(test, comma10.);
     else result=input(test, comma10.1);
    put 'this should be:' result;
run;

But the problem here would be that

10  ->  1

Problems with like 10,00 (which is supposed to be 10) becoming 100 could probably be resolved by substituting the ',' with '.', but the characters with no decimal delimiter remain a problem.

Is there any clever solution to this?

My solution which is a bit hacky (and basically only uses the fact that the comma introduces a length>2 - problems with e.g. 123 would still arise):

 data _null_;
     t='5,5';
     test=tranwrd(t, ',', '.');
     if length(test)=1 or length(test)=2 then result=input(test, comma10.);
     else result=input(test, comma10.1);
    put 'this should be:' result;
 run;

Upvotes: 0

Views: 2686

Answers (2)

Tom
Tom

Reputation: 51566

Sounds like your text strings were created in a place where the normal meaning of comma and period in numbers is reversed. So instead of using a period for decimal point and comma for thousand grouping they have reversed the meaning.

For that type of strings SAS has the COMMAX informat.

Normally you do NOT want to add a decimal specification to your informat. The decimal part of the informat is only used when the source string does not have a explicit decimal point. Basically it is telling SAS to divide values without an explicit decimal point by 10 to the power of the number of decimal places in the informat specification. It is designed to read data where the decimal point was purposely not written in order to save space.

Pretty much all the COMMA informat does is strip the string of commas and dollar signs and then read it using the normal numeric informat.

The COMMAX informat is the one that will understand the reversed meaning of the commas and periods. So it pretty much eliminates the periods and then converts the commas to periods and then reads it using the normal numeric informat.

Try a little test of your own.

data check;
  input @1 string $32. @1 num ??32.  @1 comma ??comma32. @1 commax ??commax32. 
        @1 d2num ??32.2 @1 d2comma ??comma32.2 @1 d2commax ??commax32.2
  ;
cards;
123
123.4
123,4
1,234.5
1.234,5
;
proc print;
run;

enter image description here

Upvotes: 2

user190080
user190080

Reputation: 535

As it turns out (found it here) the COMMAXw,d does the trick without any hassle, the code then would be:

data _null_;
    test='0,5';
    result = input(test, COMMAX10.);
    put 'this should be:' result;
run;

I find it a bit anti-intuitive, but it works.

Upvotes: 1

Related Questions