Béa
Béa

Reputation: 121

python3 sqlite and regex query

I use python3 to search an exact word in a sqlite3 database. When I do it with SQLite database browser it works perfectly. But when I do it with python3, it throws: sqlite3.OperationalError: user-defined function raised exception. (I will use this code for an app to search in a treeview)

Here is the code:

import sqlite3
import re


def regexp(expr, item):
    reg = re.compile(expr)
    return reg.search(item) is not None
conn = sqlite3.connect('/database.db')
conn.create_function("REGEXP", 2, regexp)
cursor = conn.cursor()
cursor.execute('SELECT * FROM table WHERE column REGEXP ?',['(?i)(?<=\s|^|\W)word(?=\s|$|\W)'])
data=cursor.fetchall()
print(data)

I updated my code as:

import sqlite3
import re
def regexp
def regexp(expr, item):
    try:
        pattern = re.compile(expr, re.I)

        return re.search(pattern, str) is not None
    except Exception as e:
        print(e)


conn = sqlite3.connect('database.db')
conn.create_function("regexp", 2, regexp)
cursor = conn.cursor()
try:
    cursor.execute(r'SELECT * FROM table WHERE column REGEXP ?',[r'(?i)(?<=\s|^|\W)word(?=\s|$|\W)'])
    data=cursor.fetchall()
    print(data)
except Exception as e:
    print(e)

And it throws : look-behind requires fixed-width pattern

Upvotes: 2

Views: 433

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626754

The (?i)(?<=\s|^|\W)word(?=\s|$|\W) pattern contains a lookbehind with alternatives, two of which match a single-char pattern and one of them matches no char (^ only asserts the position at the start of string).

The problem is that Python re does not support lookbehinds of non-fixed length. Let's try to refactor the regex:

  • (?i)(?<=[\s\W]|^)word(?=[\s\W]|$): Look how single-char matching patterns are grouped with the help of a character class. We now clearly see that [\s\W] can be re-written as \W because \W matches whitespaces. So, the regex can look like
  • (?i)(?<=\W|^)word(?=\W|$): Better, but it is still not what we need, the alternatives do not match the same amount of chars. However, at this point I can clearly see you want to match a word in between non-word chars or start/end of string.

So, you can use

(?i)(?<!\w)word(?!\w)

with (?<!\w) negative lookbehind and (?!\w) negative lookahead both containing a single \w (no more alternation operator necessary!), which is basically

(?i)\bword\b

with mere word boundaries (in Python code, r'(?i)\bword\b').

However, the pattern variant with the lookarounds can be prefereble in cases the word is a variable, when you use it like

fr'(?i)(?<!\w){re.escape(word)}(?!\w)'

As far as the code is concerned, you have a typo in the regexp method: str cannot be used as the input argument to re.search. You can fix it by using

def regexp(expr, item):
    try:
        return re.search(expr, item, re.I) is not None
    except Exception as e:
        print(e)

See demo:

enter image description here

Upvotes: 2

Related Questions