amhrpi
amhrpi

Reputation: 53

Matching string across multiple regular expressions

I have a postgresql database with table that contains about 50 million entries of strings that are addresses. Example of strings are

NIAID, Opportunist Infect Res Branch, Treatment Res Programs, Div Aids, Bethesda, MD USA
PRINCETON UNIV,DEPT PSYCHOL,PRINCETON,NJ 08544

etc.

Now I have to check if an address matches any of a list of about 30,000 regular expressions. I also need to know which regular expression is matched. Examples of regular expressions are

%umass mem med ctr worcester%worcester%ma%
%darnnouth% 

This is of course in the 'LIKE' format for postgres. Since regex matching cannot take advantage of indexing (not a lot anyway, I have already indexed the field varchar_pattern_ops), the total running time of this operation will be about 30000x50 million.

I can also use python to do regex matching if there exists any python library that will help me quicken the process.

Thanks for the help!

Upvotes: 1

Views: 165

Answers (1)

Chris Travers
Chris Travers

Reputation: 26464

What's wrong with:

CREATE TABLE regex (
     regex text primary key
);

SELECT * FROM my_table
  JOIN regex ON mytextfield like regex;

Upvotes: 1

Related Questions