camelNeck
camelNeck

Reputation: 99

Multiple matches of regular expressions with different strings

It's quite difficult to describe (especially as a non-native speaker), but I'll do my best:

In my python program, I have a database of, let's say, news articles and one of the users wanting to be informed about a subset of them. Every news object has multiple strings like Author, Title or Text.

I want to save the user's interests as an expression which allows me to match the different string attributes and combine those matches with logical operators like this (The syntax doesn't really matter):

attribute author matches pattern (\w*\sSmith) and attribute text doesn't
contain pattern (financ(e|ial))

Then, I have to iterate, for every user, over all articles and if the expression is valid, inform him/her.

My problem is that I don't really know what language to use. I'd like to avoid creating my own and writing my own parser with all the usual problems (security, escaping, etc.) because I'm sure this is a fairly common problem and there has to be a better solution than I'm able to create.

I've searched the web for some time now, but haven't found anything. Every help is very appreciated; thanks in advance!

[Edit:] Reformat pseudo-code as RabbidRabbit suggested.

Upvotes: 0

Views: 200

Answers (1)

ubik
ubik

Reputation: 4560

There are several ways to approach this problem. They range from a list of (attribute, regexp) tuples that you apply in a per-object basis to more complex things.

One option is to find some kind of declarative "language" with which you can specify simple queries such as the one you mention. This can be something that would be stored in a JSON or YAML structure, it all depends on how complex/extensible you want it to be.

If you want it to be really extensible, you may event want to have a DSL (domain-specific language):

http://www.slideshare.net/Siddhi/creating-domain-specific-languages-in-python

Here is a past StackOverflow post that may be helpful.

Writing a Domain Specific Language for selecting rows from a table

The simplest solution I can see (to parse, generate and store) is a LISP-style prefix list of tuples, such as:

[('and', ('body', '.*to be or not.*'), ('author', (not, '.*shakespeare.*'))),
 ...]

If all you need is basic boolean operators and RegExs, that should be enough.

[Edit] Added example

Upvotes: 1

Related Questions