Bob
Bob

Reputation: 121

SAS Converting Characters/Number to Numbers

I am looking for a way to convert the characters into numbers in SAS so that I can use the max function. Also, it would be helpful if the characters and only the numbers are kept. Below is a list of data for a column in a SAS table.

Column UNK
abc20140714
abc20140714x
abc20140714xyz
123_abc20140714_xyz
abc20150718

After stripping out the number values from the column, I would then group the data and use the max function in SAS, which should only generate the value 20150718.

To avoid any confusion, my question, is there a way to strip out the non-numeric values, and then convert the column into a numeric column so I can use the max function?

Thanks.

Upvotes: 1

Views: 1255

Answers (2)

catquas
catquas

Reputation: 720

To get the first sequence of 8 digits in a row starting with a 1 or a 2 as a numeric value, you can use the following:

data want;
    set have;
    pos = prxmatch("/[12]\d{7}/", character_string);
    if pos > 0 then number = input(substr(character_string, pos, 8), 8.);
    else number = .;
    drop pos;
run;

The prxmatch expression finds the starting position of the sequence, and the substr expression extracts the sequence, then the input function converts it to a numeric.

(Edited to incorporate Joe's feedback)

Upvotes: 1

Joe
Joe

Reputation: 63434

Sure!

var_num = input(compress(var_char,,'kd'),yymmdd8.);

Compress removes or keeps characters from a list. 'kd' says to 'keep digits'.

You then input using the appropriate informat; yymmdd8. looks right based on the data you provide. Then apply a format, format var_num yymmdd8n.; or similar, so it looks like a date visually (even if it's really a number underneath).

As pointed out, this won't work if there are other numeric digits in the values; you need to look at your data and identify how those appear and clean them out separately. You could use a regular expression for example to identify things that have 8 consecutive digits, starting with a 20; but ultimately it is a data analysis issue to handle these as your data require.

Upvotes: 4

Related Questions