Harlan Nelson
Harlan Nelson

Reputation: 1502

SAS error using prxmatch with DS2

It appears to me that there is an error when using prxmatch with SAS DS2. It is also possible my code is in error. I want to know if this problem is because of my code or a compile error with SAS.

In the code below I match search terms in one data table with search text in another.

data master_table;
    input name $ search_text $;
    datalines;
Frank   allHere
John    Sales
Mary    Acctng
Joe     Findme
Sue     Hereiam
Jim     graccaa
;
run;

proc print data= master_table; run;

data search_term_table;
    infile datalines missover;
    input id $ search_term $;
datalines;
1   Here
2   Find
3   Acc
;
run;

proc ds2;
    data search_results (overwrite=yes);
    retain rc;
    dcl double rc c ;
    declare char(8) id N;
    declare char(11) name;
    declare char(1) c_options;
    declare char(20) search_term search_text;
    dcl package hash h(1, 'search_term_table');
    dcl package hiter hi('h');
        method init();
            rc = h.keys([id]);
            rc = h.data([id search_term]);
            rc = h.defineDone();
        end;
        method run();
            dcl double rc;
            set master_table;
            if _N_ = 1 then put 'ROW    ITEM';
            N = _N_;
            rc = hi.first();
            do while(rc=0);
                c_options = 'i';
                search_term = cats('/', search_term, '/', c_options);
                search_text = catx(' ', search_text);
                c = prxmatch(search_term, search_text);
                put N id 'prxmatch(' search_term ',' search_text '); ---> ' c;
                output;
                rc = hi.next();
            end;
        end;
    enddata;
run;
quit;

The results of the put statement are shown below.

In ROW 3 ITEM 1 a match is incorrectly found, because it is using the regex from the last item of the previous row, not the current one.

In ROW 5 ITEM 1 the situation is reversed. A match is not found because, again, it is using the regex from the last item of the previous row.

ROW    ITEM
1        1        prxmatch( /Here/i              , allHere              ); --->  4
1        2        prxmatch( /Find/i              , allHere              ); --->  0
1        3        prxmatch( /Acc/i               , allHere              ); --->  0
2        1        prxmatch( /Here/i              , Sales                ); --->  0
2        2        prxmatch( /Find/i              , Sales                ); --->  0
2        3        prxmatch( /Acc/i               , Sales                ); --->  0
3        1        prxmatch( /Here/i              , Acctng               ); --->  1
3        2        prxmatch( /Find/i              , Acctng               ); --->  0
3        3        prxmatch( /Acc/i               , Acctng               ); --->  1
4        1        prxmatch( /Here/i              , Findme               ); --->  0
4        2        prxmatch( /Find/i              , Findme               ); --->  1
4        3        prxmatch( /Acc/i               , Findme               ); --->  0
5        1        prxmatch( /Here/i              , Hereiam              ); --->  0
5        2        prxmatch( /Find/i              , Hereiam              ); --->  0
5        3        prxmatch( /Acc/i               , Hereiam              ); --->  0
6        1        prxmatch( /Here/i              , graccaa              ); --->  3
6        2        prxmatch( /Find/i              , graccaa              ); --->  0
6        3        prxmatch( /Acc/i               , graccaa              ); --->  3
NOTE: Execution succeeded. 18 rows affected.
2752  quit;

Upvotes: 0

Views: 271

Answers (1)

Richard
Richard

Reputation: 27498

The PRXMATCH might be doing some weird compiled regex caching with an implicit /o. I can't figure out the rationale behind the observed output, even when considering some PRXMATCH pattern may have been compiled 'once'.

Unfortunately, DS2 does not like CALL PRXDEBUG(1); which might have shed some light.

From PRXMATCH docs

Compiling a Perl Regular Expression
If perl-regular-expression is a constant or if it uses the /o option, then the Perl regular expression is compiled once and each use of PRXMATCH reuses the compiled expression. If perl-regular-expression is not a constant and if it does not use the /o option, then the Perl regular expression is recompiled for each call to PRXMATCH.
Note: The compile-once behavior occurs when you use PRXMATCH in a DATA step, in a WHERE clause, or in PROC SQL. For all other uses, the perl-regular-expression is recompiled for each call to PRXMATCH.

So the docs don't quite spell out what happens in DS2, but you know sometimes something special happens.

The best fix is to explicitly PRXPARSE the dynamic regex pattern to get an id that is used in PRXMATCH

            dcl int rx;
            rx = prxparse(search_term);
            c = prxmatch(rx, search_text);

This could be memory problematic because there is not a PRXFREE function and DS2 does not allow use of call routine CALL PRXFREE(rx); To avoid the potential 'memory` problem create an array or hash of ids of prxparsed patterns that will be used and use the ids retrieved via search_term lookup.

Upvotes: 1

Related Questions