Daniel F
Daniel F

Reputation: 14239

In Python, how can I query a list of words to match a certain query criteria?

The query criteria should support boolean operators and regular expressions. I've read about Booleano, but it doesn't support regular expressions.

If there is nothing out there which matches this requirements, which would be the best technology to start building upon?

The grammar in the example below is just an example, but the feature it offers should exist.

is True if ('client/.+' and 'user_a') but (not 'limited' unless ('.+special' or 'godmode'))

which equals to

is True if 'client/.+' and 'user_a' and (not ('limited' and (not ('.+special' or 'godmode'))))

applied on the following lists

is_true  = ['client/chat', 'user_a', 'limited', 'extraspecial']
is_false = ['client/ping', 'user_a', 'limited']
is_false = ['server/chat']
is_false = ['server/ping', 'ping']

Upvotes: 0

Views: 989

Answers (1)

Daniel F
Daniel F

Reputation: 14239

I managed to solve the problem with the use of the pyparsing module.

import re
import pyparsing

class BoolRegEx(object):

  def Match(self, tags=[], query=""):
    self.tags = tags
    if ' ' not in query:
      return self.Search(query)
    else:
      return pyparsing.operatorPrecedence(
        pyparsing.Word(pyparsing.printables, excludeChars="()"), [
          (pyparsing.Literal("NOT"), 1, pyparsing.opAssoc.RIGHT, self.Not),
          (pyparsing.Literal("OR"),  2, pyparsing.opAssoc.LEFT,  self.Or),
          (pyparsing.Literal("AND"), 2, pyparsing.opAssoc.LEFT,  self.And),
        ]
      ).parseString(query, parseAll=True)[0]

  def Search(self, a):
    try:
      regex = re.compile(a.replace("<<", "#~").replace(">>", "~#").replace(">", ")").replace("<", "(").replace("#~", "<").replace("~#", ">"))
      for tag in self.tags:
        match = regex.match(tag)
        if match and len(match.group(0)) == len(tag):
          return True
      return False
    except:
      raise

  def And(self, t):
    for a in t[0][0::2]:
      if isinstance(a, basestring):
        v = self.Search(a)
      else:
        v = bool(a)
      if not v:
        return False
    return True

  def Or(self, t):
    for a in t[0][0::2]:
      if isinstance(a, basestring):
        v = self.Search(a)
      else:
        v = bool(a)
      if v:
        return True
    return False

  def Not(self, t):
    a = t[0][1]
    if isinstance(a, basestring):
      return not self.Search(a)
    else:
      return not bool(a)

print BoolRegEx().Match(['client/chat', 'user_a', 'limited', 'extraspecial'], "client/.+ AND user_a AND NOT ( limited AND NOT ( .+<r|i>special OR godmode ) )")
# False

print BoolRegEx().Match(['client/chat', 'user_a', 'limited', 'superspecial'], "client/.+ AND user_a AND NOT ( limited AND NOT ( .+<r|i>special OR godmode ) )")
# True

I had to replace the regexp () with <> in order to avoid collisions, but at this moment all of this seems to be the best solution.

Upvotes: 1

Related Questions