mgo
mgo

Reputation: 65

RegEx string works when directly assigned in Python, but not from a PostgreSQL database

I have a working routine to determine the categories a news item belongs to. The routine works when assigning values in Python for the title, category, subcategory, and the search words as RegExp.

But when retrieving these values from PostgreSQL as strings I do not get any errors, or results from the same routine.

I checked the datatypes, both are Python strings.

What can be done to fix this?

# set the text to be analyzed
title = "next week there will be a presentation. The location will be aat"

# these could be the categories
category = "presentation"
subcategory = "scientific"

# these are the regular expressions
main_category_search_words = r'\bpresentation\b'
sub_category_search_words= r'\basm microbe\b | \basco\b | \baat\b'

category_final = ''
subcategory_final = ''

# identify main category
r = re.compile(main_category_search_words, flags=re.I | re.X)
result = r.findall(title)

if len(result) == 1:
    category_final = category

    # identify sub category
    r2 = re.compile(sub_category_search_words, flags=re.I | re.X)
    result2 = r2.findall(title)
    if len(result2) > 0:
        subcategory_final = subcategory

print("analysis result:", category_final, subcategory_final)

Upvotes: 0

Views: 195

Answers (1)

shmee
shmee

Reputation: 5101

I'm pretty sure that what you get back from PostgreSQL is not a raw string literal, hence your RegEx is invalid. You will have to escape the backslashes in your pattern explicitly in the DB.

print(r"\basm\b")
print("\basm\b")
print("\\basm\\b")

# output
\basm\b

as       # yes, including the line break above here
\basm\b

Upvotes: 1

Related Questions