gdogg371
gdogg371

Reputation: 4152

Two pattern groupings with PRXPARSE regex?

Following on from a question I asked earlier today, I now have my state scanning regex doing exactly what I want. I now want to bring in state codes to my regex. My state names I want to case insensitive and my state codes to be case sensitive. Therefore I have put two pattern groupings within my regex that have different case settings.

Both groups work as expected when used on their own in a regex, however when I try using two groupings, only the second one for state names is finding a match. The code is below:

options noquotelenmax;

data countries;
do i = 1 to 20;
output;
end;
run;

data countries;
length state $50.;
set countries;

if i = 1 then state = 'CALIFORNIA';
if i = 2 then state = 'alabama';
if i = 3 then state = 'NewYork';
if i = 4 then state = 'OHIO';
if i = 5 then state = 'ohio';
if i = 6 then state = 'FLORIDA';
if i = 7 then state = 'georgia';
if i = 8 then state = 'TEXAS';
if i = 9 then state = 'Kansas';
if i = 10 then state = 'MAINE';

if i = 11 then state = 'AL';
if i = 12 then state = 'AK';
if i = 13 then state = 'CO';
if i = 14 then state = 'MT';
if i = 15 then state = 'OH';
if i = 16 then state = 'SD';
if i = 17 then state = 'PA';
if i = 18 then state = 'IA';
if i = 19 then state = 'PW';
if i = 20 then state = 'AP';

run;

data countries;
set countries;
prx_1 = (prxparse(
"/^(?:AL|AK|AZ|AR|
CA|CO|CT|DE|
DC|FL|GA|HI|
ID|IL|IN|IA|
KS|KY|LA|ME|
MD|MA|MI|MN|
MS|MO|MT|NE|
NV|NH|NJ|NM|
NY|NC|ND|OH|
OK|OR|PA|RI|
SC|SD|TN|TX|
UT|VT|VA|WA|
WV|WI|WY|AS|
GU|MP|PR|VI|
UM|FM|MH|PW|
AA|AE|AP|CM|
CZ|NB|PI|TT|)(?i:Alabama|Alaska|Arizona|Arkansas|
California|Colorado|Connecticut|Delaware|
District\s*of\s*Columbia|Florida|Georgia|Hawaii|
Idaho|Illinois|Indiana|Iowa|Kansas|
Kentucky|Louisiana|Maine|Maryland|
Massachusetts|Michigan|Minnesota|Mississippi|
Missouri|Montana|Nebraska|Nevada|
New\s*Hampshire|New\s*Jersey|New\s*Mexico|
New\s*York|North\s*Carolina|North\s*Dakota|
Ohio|Oklahoma|Oregon|Pennslyvania|
Rhode\s*Island|South\s*Carolina|South\s*Dakota
Tennessee|Texas|Utah|Vermont|Virginia|
Washington|West\s*Virginia|Wisconsin|Wyoming|
American\s*Samoa|Guam|Northern\s*Mariana\s*Islands|
Puerto\s*Rico|Virgin\s*Islands|
U\s*S\s*\s*Minor\s*Outlying\s*Islands|
Federated\s*States\s*of\s*Micronesia|Marshall\s*Islands|
Palau)$/"));
prx_valid_addr_1 = (prxmatch(prx_1, strip(state))) ;
run;

options quotelenmax;

Can anyone see what I am doing wrong?

Thanks

Upvotes: 0

Views: 83

Answers (1)

Leo
Leo

Reputation: 2135

You have an extra "|" after "TT" and no "|" between the two groups. The following should work:

data countries;
set countries;
prx_1 = (prxparse(
"/^(?:AL|AK|AZ|AR|
CA|CO|CT|DE|
DC|FL|GA|HI|
ID|IL|IN|IA|
KS|KY|LA|ME|
MD|MA|MI|MN|
MS|MO|MT|NE|
NV|NH|NJ|NM|
NY|NC|ND|OH|
OK|OR|PA|RI|
SC|SD|TN|TX|
UT|VT|VA|WA|
WV|WI|WY|AS|
GU|MP|PR|VI|
UM|FM|MH|PW|
AA|AE|AP|CM|
CZ|NB|PI|TT)|(?i:Alabama|Alaska|Arizona|Arkansas|
California|Colorado|Connecticut|Delaware|
District\s*of\s*Columbia|Florida|Georgia|Hawaii|
Idaho|Illinois|Indiana|Iowa|Kansas|
Kentucky|Louisiana|Maine|Maryland|
Massachusetts|Michigan|Minnesota|Mississippi|
Missouri|Montana|Nebraska|Nevada|
New\s*Hampshire|New\s*Jersey|New\s*Mexico|
New\s*York|North\s*Carolina|North\s*Dakota|
Ohio|Oklahoma|Oregon|Pennslyvania|
Rhode\s*Island|South\s*Carolina|South\s*Dakota
Tennessee|Texas|Utah|Vermont|Virginia|
Washington|West\s*Virginia|Wisconsin|Wyoming|
American\s*Samoa|Guam|Northern\s*Mariana\s*Islands|
Puerto\s*Rico|Virgin\s*Islands|
U\s*S\s*\s*Minor\s*Outlying\s*Islands|
Federated\s*States\s*of\s*Micronesia|Marshall\s*Islands|
Palau)$/"));
prx_valid_addr_1 = (prxmatch(prx_1, strip(state))) ;
run;

Upvotes: 3

Related Questions