Reputation: 192
I need to update a number sql queries to point to a new set of tables. I am having a hard time finding the word ends, i.e. the end of the table names, with python's re. The white space character and the end of the table names can be a newline or a space. The script is not finding matches for the regex.sub() method. I have tried using "\s" instead of "\b". This does find the table names well but then when I replace the string I am losing the original white space character. What am I doing wrong?
import os
import re
scan_dir = r"xxxx"
out_dir = r"xxxx"
process_tables = {"MM_WMS_WORK_LOCATION": "DESIGNER_MM_WMS_WORK_LOCATION",
"MM_WMS_DESIGN": "DESIGNER_MM_WMS_DESIGN",
"MM_WMS_WORK_REQUEST": "DESIGNER_MM_WMS_WORK_REQUEST",
"MM_WMS_COMPATIBLE_UNIT": "DESIGNER_MM_WMS_COMP_UNIT",
"MM_WMS_DN_COST_FACTOR": "DESIGNER_MM_WMS_DN_COST_FACTOR",
"MM_WMS_CU_COST_FACTOR": "DESIGNER_MM_WMS_CU_COST_FACTOR",
"MM_WMS_WR_USERS": "DESIGNER_MM_WMS_WR_USERS",
"MM_PX_CURRENT_STATE": "DESIGNER_MM_PX_CURRENT_STATE"}
sql_list = os.listdir(scan_dir)
for sql in sql_list:
if sql[-4:] == ".sql":
with open(os.path.join(scan_dir, sql), "r") as sql_in:
sql_out = sql_in.read()
for old_name, new_name in process_tables.items():
regex = re.compile(old_name + "\b", re.IGNORECASE)
sql_out = regex.sub(new_name.lower(), sql_out)
with open(os.path.join(out_dir, sql[:-4] + " Designer Test.sql"), "w") as file_out:
file_out.write(sql_out)
Upvotes: 0
Views: 770
Reputation: 338326
Python string literals use the backslash as an escape character. The string literal "\b"
will contain the backspace character (ASCII 0x8), just like "\n"
will contain a newline. It will not contain "backslash b", as you would need for the regex to work.
To get "backslash b" you need to escape the backslash in the string literal, or use a raw string, which is "\\b"
or r"\b"
, respectively.
regex = re.compile(old_name + r"\b", re.IGNORECASE)
Commonly, raw strings are used for regex in Python, so you don't need to double every backslash.
Read https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
Side note: "\s"
works because - in contrast to "\b"
or "\n"
- it means nothing special in a Python string literal, so it actually becomes "backslash s" for the regex.
Upvotes: 1