ap0
ap0

Reputation: 1173

Split SQL insert statement into key-value pairs

Given the following statement as a string

stmt = "insert into table (col1, col2, col3) values (100, '() string with parantheses ()', 2.3);"

I am trying to find a regular expression (or any other way if there is a better one) to split that string into a list ["(col1, col2, col3)", "(100, '() string with parantheses ()', 2.3)"] such that I can later fill a dicionary with column names and values

d = { "col1" : "100", "col2" : "'() string with parantheses ()'", "col3" : "2.3" }

So far I have the following solution that I don't like (or rather I belive there is a solution using regex only to do the same).

re.findall("\([^\r\n]*\)", stmt)[0].split("values")
# from here on I would have to parse the two strings and fill a dict

I am not able to find a solution where I don't have to split the string by 'values' using regex only. My main problem is that the second parantheses string where the values are may contain parantheses themself in strings.

Upvotes: 2

Views: 2705

Answers (5)

D.L.MAN
D.L.MAN

Reputation: 1093

Use blow code and enjoy it.

import re
import ast


def ParseInsert(sql):
    cols, values = re.split("values", sql.rstrip(";"), flags=re.IGNORECASE)
    cols = cols.strip(" ").rstrip(")").partition("(")[2].rstrip(")").split(",")
    cols = [x.strip(" ").strip("`").strip("'").lstrip("[").rstrip("]") for x in cols]
    values = values.strip(" ")
    d = dict(zip(cols, ast.literal_eval(values)))
    return d


sqls = [
    "INSERT INTO TBL1(Name,Age)VALUES('Jon',10)",
    "INSERT INTO TBL1 (Name, Age) VALUES ('Jon', 10)",
    "INSERT INTO TBL1 ( Name , Age ) VALUES ( 'Jon' , 10 )",
    "INSERT INTO [TBL1] ( [Name] , [Age] ) VALUES ( 'Jon' , 10 )",
    "INSERT INTO 'TBL1' ( 'Name' , 'Age' ) VALUES ( 'Jon' , 10 )",
    "INSERT INTO `TBL1` ( `Name` , `Age` ) VALUES ( 'Jon' , 10 )"
]

for sql in sqls:
    print(sql)
    print(ParseInsert(sql))
    print()

Outputs:

INSERT INTO TBL1(Name,Age)VALUES('Jon',10)
{'Name': 'Jon', 'Age': 10}

INSERT INTO TBL1 (Name, Age) VALUES ('Jon', 10)
{'Name': 'Jon', 'Age': 10}

INSERT INTO TBL1 ( Name , Age ) VALUES ( 'Jon' , 10 )
{'Name': 'Jon', 'Age': 10}

INSERT INTO [TBL1] ( [Name] , [Age] ) VALUES ( 'Jon' , 10 )
{'Name': 'Jon', 'Age': 10}

INSERT INTO 'TBL1' ( 'Name' , 'Age' ) VALUES ( 'Jon' , 10 )
{'Name': 'Jon', 'Age': 10}

INSERT INTO `TBL1` ( `Name` , `Age` ) VALUES ( 'Jon' , 10 )
{'Name': 'Jon', 'Age': 10}

Upvotes: 0

Moses Koledoye
Moses Koledoye

Reputation: 78556

If you really want this, your regex will end up being complicated with assertion expressions:

don't have to split the string by 'values'

match = re.findall("""(?<!\') # No preceeding ' before (
                     \(
                     (?!\))  # A closing parenthesis must not follow a (
                     [^\r\n]*?
                     (?<!\() # An opening parenthesis must not precede a )
                     \)
                     (?!\')  # No following ' immedaitely after )
                     """, stmt, re.VERBOSE)
# ['(col1, col2, col3)', "(100, '() string with parantheses ()', 2.3)"]

r = [o.strip() for i in match for o in i[1:-1].split(',')]
d = dict(zip(*r))
# {'col1': '100', 'col3': '2.3', 'col2': "'() string with parantheses ()'"}

You should go with the SQL solution for correctness

Upvotes: 0

alexis
alexis

Reputation: 50200

Why mess with these ugly hacks? Let SQL parse SQL. Here's a complete program to turn any insert statement into tuples:

my_insert = """insert into some_table (col1, col2, col3) values (100, 
                                        '() string with parantheses ()', 2.3);"""

import sqlite3
conn = sqlite3.connect(":memory:")  
conn.execute("create table some_table (col1, col2, col3)")
conn.execute(my_insert)
parsed_rows = list(conn.execute("select * from some_table"))
conn.close()

print(parsed_rows)
# Output:
[(100, '() string with parantheses ()', 2.3)]

Of course you might also want to consider actually storing your data in a database, instead of whatever you're planning to do with them now. In that case, use a filename instead of ":memory:" when establishing the connection and you'll get persistent storage.

Upvotes: 4

Jon Clements
Jon Clements

Reputation: 142156

If your statement is always in the same format, you can use some basic string operations and ast.literal_eval to evaluate the values... Note that this will also end up with the values having types of int, str and float.

import ast
import csv

stmt = "insert into table (col1, col2, col3) values (100, '() string with parantheses ()', 2.3);"
pre, values = stmt.rstrip(';').partition(' values ')[::2]
cols = pre.partition('(')[2]
d = dict(zip(cols.rstrip(')').split(', '), ast.literal_eval(values)))

This'll give you:

{'col1': 100, 'col2': '() string with parantheses ()', 'col3': 2.3}

Upvotes: 4

Ma0
Ma0

Reputation: 15204

Well, this is not beautiful but assuming the string you work with will always be an insert statement (based on their characteristics), this should work:

stmt = "insert into table (col1, col2, col3) values (100, '() string with parantheses ()', 2.3);"

# if it will always be an insert statement, the following will work.
par1 = stmt[stmt.find("(") + 1:stmt.find(") values")]
par2 = stmt[stmt.find("values (") + 8:-2]

par1_list = par1.split(",")
par2_list = par2.split(",")

d = dict(zip(par1_list, par2_list))

print(d) # prints: {' col2': " '() string with parantheses ()'", ' col3': ' 2.3', 'col1': '100'}

If you have other insert statements, please try this and tell me if it works. Thanks.

Upvotes: 0

Related Questions