Federicofkt
Federicofkt

Reputation: 693

Write SQL script out of pandas values

I have a pd df that looks like this:

tab_name col_name
CV_TAB1 TAB1_COL1
CV_TAB1 TAB1_COL2
CV_TAB1 TAB1_COL3
CV_TAB2 TAB2_COL1
CV_TAB2 TAB2_COL2

what I need is a python script that generates as many SQL scripts as are the different tab_names WITHOUT the CV_ prefix (in the example, 2 scripts) that create a view for every table with associated columns WITHOUT the TABX_ prefix. So:

'CREATE WIEV TAB1 AS (
SELECT 
TAB1_COL1 AS COL1,
TAB1_COL2 AS COL2,
TAB1_COL3 AS COL3
FROM CV_TAB1);'

And another one for TAB2.

Note that in the FROM clause the table name is WITH the CV_.

I've written the code that creates the script:

script = ""
i = 0
list_of_scripts = list()


for t in list_of_table_names:     
    if t['name'][:3] == 'CV_':
        table_name = t['name'][3:] # Removing CV_ from table name
        script += "DROP VIEW IF EXISTS " + str(table_name) + ";\n\n" + "CREATE VIEW "+ str(table_name) + " AS ( " + "\n" + "SELECT\n" # First part of SQL script
        for c in list_of_columns_out_of_pd_df:
            if c['columnName'][:len(table_name)] == str(table_name): # Check for the TABX_ prefix
                column_name = c['columnName'][len(table_name)+1:]
                if i == 0:
                    script += str(c['columnName']) + " AS " + str(column_name) + "\n"
                    i+=1
                else:
                    script += "," + str(c['columnName']) + " AS " + str(column_name) + "\n"
            else:
                if i == 0:
                    script += str(c['columnName']) + " AS " + str(c['columnName']) + "\n"
                    i+=1
                else:
                    script += "," + str(c['columnName']) + " AS " + str(c['columnName']) + "\n"


        i = 0
        script += "FROM " + str(t['name']) + ");"
        list_of_scripts.append(script)
        script = ""

what I am missing are the list_of_table_names and the list_of_columns_out_of_pd_df variables

Thanks!

Upvotes: 1

Views: 60

Answers (1)

Ben.T
Ben.T

Reputation: 29635

Here is a way using groupby.

l = []
for tb, cols in df.groupby('tab_name')['col_name']:
    # aggregate the cols with the wanted format
    _s = ',\n'.join([f'{col} as {col.split("_")[-1]}' for col in cols])
    # create the full query
    s = f'''CREATE VIEW {tb.split("_")[-1]} AS (\nSELECT\n{_s}\nFROM {tb});'''
    l.append(s)

print(l[0])
# CREATE VIEW TAB1 AS (
# SELECT
# TAB1_COL1 as COL1,
# TAB1_COL2 as COL2,
# TAB1_COL3 as COL3
# FROM CV_TAB1);

print(l[1])
# CREATE VIEW TAB2 AS (
# SELECT
# TAB2_COL1 as COL1,
# TAB2_COL2 as COL2
# FROM CV_TAB2);

Upvotes: 3

Related Questions