Reputation: 65
I have a working routine to determine the categories a news item belongs to. The routine works when assigning values in Python for the title, category, subcategory, and the search words as RegExp.
But when retrieving these values from PostgreSQL as strings I do not get any errors, or results from the same routine.
I checked the datatypes, both are Python strings.
What can be done to fix this?
# set the text to be analyzed
title = "next week there will be a presentation. The location will be aat"
# these could be the categories
category = "presentation"
subcategory = "scientific"
# these are the regular expressions
main_category_search_words = r'\bpresentation\b'
sub_category_search_words= r'\basm microbe\b | \basco\b | \baat\b'
category_final = ''
subcategory_final = ''
# identify main category
r = re.compile(main_category_search_words, flags=re.I | re.X)
result = r.findall(title)
if len(result) == 1:
category_final = category
# identify sub category
r2 = re.compile(sub_category_search_words, flags=re.I | re.X)
result2 = r2.findall(title)
if len(result2) > 0:
subcategory_final = subcategory
print("analysis result:", category_final, subcategory_final)
Upvotes: 0
Views: 195
Reputation: 5101
I'm pretty sure that what you get back from PostgreSQL is not a raw string literal, hence your RegEx is invalid. You will have to escape the backslashes in your pattern explicitly in the DB.
print(r"\basm\b")
print("\basm\b")
print("\\basm\\b")
# output
\basm\b
as # yes, including the line break above here
\basm\b
Upvotes: 1