Reputation: 13
long time reader, first time poster
I’ve looked for the answer but nothing that is within my skill to convert to a solution. I'd appreciate any help!
I'm trying to extract numbers out of a text data set in SAS, so in ProcSQL or DATAstep.
I want to return groups of numbers from a free-text field.
This field contains either:
-at any point in the text, with or without text either side, of any length. For example:
REC NOTES
001 Collateral 83948572 (code 56/56-55) open June 2013
002 Scoobydoo 12.12.12 88888888
003 54545454 over three years
I want to extract into the output:
8-digit no. if present | 6-digit no. if present
83948572 | 565655
88888888 | 121212
54545454 |
Can anyone suggest the direction I might look in?
Upvotes: 1
Views: 1032
Reputation: 4554
Try this:
data have;
input REC $ NOTES $60.;
temp=prxchange('s/[a-z]+//i',-1,notes);
do i=1 to countw(temp);
num=compress(scan(temp,i,' '),,'kd');
if length(num)=8 then num8=num;
else if length(num)=6 then num6=num;
end;
drop notes num i temp;
cards;
001 Collateral 83948572 (code 56/56-55) open June 2013
002 Scoobydoo 12.12.12 88888888
003 54545454 over three years
;
proc print ;
run;
Upvotes: 0
Reputation: 15071
Use the SUBSTRING
, STUFF
& PATINDEX
functions.
SELECT REC,
substring(STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') , patindex('[0-9][0-9][0-9][0-9][0-9][0-9]', STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') ), 6)AS "6digit",
substring(STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') , patindex('%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') ), 8) AS "8digit"
FROM yourtable
Upvotes: 0