Edmund-Blackadder
Edmund-Blackadder

Reputation: 13

Extract 6- and 8-digit numbers from text string in SAS

long time reader, first time poster

I’ve looked for the answer but nothing that is within my skill to convert to a solution. I'd appreciate any help!


I'm trying to extract numbers out of a text data set in SAS, so in ProcSQL or DATAstep.

I want to return groups of numbers from a free-text field.

This field contains either:

-at any point in the text, with or without text either side, of any length. For example:

REC    NOTES

001    Collateral 83948572 (code 56/56-55) open June 2013

002    Scoobydoo 12.12.12 88888888

003    54545454 over three years

I want to extract into the output:

8-digit no. if present     | 6-digit no. if present

83948572                   | 565655
88888888                   | 121212
54545454                   |

Can anyone suggest the direction I might look in?

Upvotes: 1

Views: 1032

Answers (2)

Shenglin Chen
Shenglin Chen

Reputation: 4554

Try this:

data have;
input REC $    NOTES $60.;
temp=prxchange('s/[a-z]+//i',-1,notes);
do i=1 to countw(temp);
   num=compress(scan(temp,i,' '),,'kd');
   if length(num)=8 then num8=num;
   else if length(num)=6 then num6=num;
end;
drop notes num i temp;
cards;
001    Collateral 83948572 (code 56/56-55) open June 2013
002    Scoobydoo 12.12.12 88888888
003    54545454 over three years
;
proc print ;
run;

Upvotes: 0

Matt
Matt

Reputation: 15071

Use the SUBSTRING, STUFF & PATINDEX functions.

SELECT REC, 
substring(STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') , patindex('[0-9][0-9][0-9][0-9][0-9][0-9]', STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') ), 6)AS "6digit",
substring(STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') , patindex('%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') ), 8) AS "8digit"
FROM yourtable

Upvotes: 0

Related Questions