fallingdog
fallingdog

Reputation: 192

Replace table names in sql with python and regex

I need to update a number sql queries to point to a new set of tables. I am having a hard time finding the word ends, i.e. the end of the table names, with python's re. The white space character and the end of the table names can be a newline or a space. The script is not finding matches for the regex.sub() method. I have tried using "\s" instead of "\b". This does find the table names well but then when I replace the string I am losing the original white space character. What am I doing wrong?

import os
import re

scan_dir = r"xxxx"
out_dir = r"xxxx"

process_tables = {"MM_WMS_WORK_LOCATION": "DESIGNER_MM_WMS_WORK_LOCATION",
                  "MM_WMS_DESIGN": "DESIGNER_MM_WMS_DESIGN",
                  "MM_WMS_WORK_REQUEST": "DESIGNER_MM_WMS_WORK_REQUEST",
                  "MM_WMS_COMPATIBLE_UNIT": "DESIGNER_MM_WMS_COMP_UNIT",
                  "MM_WMS_DN_COST_FACTOR": "DESIGNER_MM_WMS_DN_COST_FACTOR",
                  "MM_WMS_CU_COST_FACTOR": "DESIGNER_MM_WMS_CU_COST_FACTOR",
                  "MM_WMS_WR_USERS": "DESIGNER_MM_WMS_WR_USERS",
                  "MM_PX_CURRENT_STATE": "DESIGNER_MM_PX_CURRENT_STATE"}
sql_list = os.listdir(scan_dir)

for sql in sql_list:
    if sql[-4:] == ".sql":
        with open(os.path.join(scan_dir, sql), "r") as sql_in:
            sql_out = sql_in.read()
            for old_name, new_name in process_tables.items():
                regex = re.compile(old_name + "\b", re.IGNORECASE)
                sql_out = regex.sub(new_name.lower(), sql_out)
        with open(os.path.join(out_dir, sql[:-4] + " Designer Test.sql"), "w") as file_out:
            file_out.write(sql_out)

Upvotes: 0

Views: 770

Answers (1)

Tomalak
Tomalak

Reputation: 338326

Python string literals use the backslash as an escape character. The string literal "\b" will contain the backspace character (ASCII 0x8), just like "\n" will contain a newline. It will not contain "backslash b", as you would need for the regex to work.

To get "backslash b" you need to escape the backslash in the string literal, or use a raw string, which is "\\b" or r"\b", respectively.

regex = re.compile(old_name + r"\b", re.IGNORECASE)

Commonly, raw strings are used for regex in Python, so you don't need to double every backslash.

Read https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals


Side note: "\s" works because - in contrast to "\b" or "\n" - it means nothing special in a Python string literal, so it actually becomes "backslash s" for the regex.

Upvotes: 1

Related Questions