Database Structure txt file comparison using python

Question

I have 2 unloaded DB structure text files (one before upgrade and one after the upgrade). This is just the DB/Table structure - like TableName, ColumnName, DataTpe, DataLength, ColumnOrder. During the DB Upgrade, some new tables and columns (to existing tables) are added, some columns are dropped. Also, the DataType/DataLength of the TableName.Column can change (Ex: int can change to bigint, varchar(100) can change to varchar(500) ).

A part of Unloaded DB Structure before upgrade:

employee,Emp_id,int,,1
employee,Emp_L_Name,varchar,10,2
employee,Emp_F_Name,varchar,20,3
employee,Emp_Gender,varchar,6,4
employee,Emp_dept,int,,5
TableA,ColumnA1,int,,1
TableA,ColumnA2,varchar,100,2
TableB,ColumnB1,bigint,,1
TableB,ColumnB2,varchar,10,2

A part of Unloaded DB Structure After upgrade:

employee,Emp_id,bigint,,1
employee,Emp_L_Name,varchar,20,2
employee,Emp_F_Name,varchar,30,3
employee,Emp_Gender,varchar,6,4
employee,Emp_Sal,int,,5
TableB,ColumnB1,bigint,,1
TableB,ColumnB2,varchar,10,2
TableB,ColumnB3,smallint,,3
TableC,ColumnC1,int,,1
TableC,ColumnC2,varchar,100,2

The EMPLOYEE table structure is just a part of the unloaded DB structure file. There are more than 200 tables in the same DB structure text file.

Can someone suggest an efficient way in python to compare the txt files (before and after upgrade) and find the difference in the DB-- basically by comparing text of the DB structure file (say DB_before.txt and DB_After.txt)? I need the following output:

New Tables added.
Tables Dropped
New Columns added
Columns dropped
DataType change (like emp_id from int to bigint), DataLength changes (Like emp_L_Name from varchar(10) to varchar(20)), ColumnOrder change etc.

Grismar · Accepted Answer

Although you should really provide an example of what you tried - both to get a better answer and to learn something in the process - I thought the problem was interesting enough to solve over coffee.

Here's my attempt:

from io import StringIO
from collections import defaultdict

# instead of files, these are memory objects, but they behave just like open files would
text1 = StringIO("""employee,Emp_id,int,,1
employee,Emp_L_Name,varchar,10,2
employee,Emp_F_Name,varchar,20,3
employee,Emp_Gender,varchar,6,4
employee,Emp_dept,int,,5
TableA,ColumnA1,int,,1
TableA,ColumnA2,varchar,100,2
TableB,ColumnB1,bigint,,1
TableB,ColumnB2,varchar,10,2""")

text2 = StringIO("""employee,Emp_id,bigint,,1
employee,Emp_L_Name,varchar,20,2
employee,Emp_F_Name,varchar,30,3
employee,Emp_Gender,varchar,6,4
employee,Emp_Sal,int,,5
TableB,ColumnB1,bigint,,1
TableB,ColumnB2,varchar,10,2
TableB,ColumnB3,smallint,,3
TableC,ColumnC1,int,,1
TableC,ColumnC2,varchar,100,2""")


def dict_from_text(f):
    structure = defaultdict(dict)
    for line in f:
        # split, ignoring the line ending
        parts = line.strip().split(',')
        # this next line is for clarity, I would suggest merging it with the next
        table, field, field_type = parts[0], parts[1], parts[2] if not parts[3] else f'{parts[2]}({parts[3]})'
        structure[table][field] = field_type
    return structure


structure1 = dict_from_text(text1)
structure2 = dict_from_text(text2)

# with the dicts, answering the questions is easy
new_tables_added = structure2.keys() - structure1.keys()
print(f'tables added: {new_tables_added}')

tables_dropped = structure1.keys() - structure2.keys()
print(f'tables dropped: {tables_dropped}')

columns_added = {t: structure2[t].keys() - structure1[t].keys() for t in structure2 if t in structure1}
print(f'columns added: {columns_added}')

columns_dropped = {t: structure1[t].keys() - structure2[t].keys() for t in structure1 if t in structure2}
print(f'columns dropped: {columns_dropped}')

# the last one is a bit more challenging in one comprehension, but still readable if you spread it a bit
field_type_changes = {
    t: {
        v: (structure1[t][v], structure2[t][v]) 
        for v in structure1[t].keys() 
        if v in structure2[t] and structure1[t][v] != structure2[t][v]
    } for t in structure1 if t in structure2
}
print(f'field types changed: {field_type_changes}')

Result:

tables added: {'TableC'}
tables dropped: {'TableA'}
columns added: {'employee': {'Emp_Sal'}, 'TableB': {'ColumnB3'}}
columns dropped: {'employee': {'Emp_dept'}, 'TableB': set()}
datatypes changed: {'employee': {'Emp_id': ('int', 'bigint'), 'Emp_L_Name': ('varchar(10)', 'varchar(20)'), 'Emp_F_Name': ('varchar(20)', 'varchar(30)')}, 'TableB': {}}

If you prefer the elements that have an empty set or empty dictionary to indicate no change not to be in the result, that's left as an exercise, I guess. There's plenty of other answers on SO explaining how to do that and putting it in now would have taken away from the clean solution I had.

Database Structure txt file comparison using python

Answers (1)

Related Questions