Reputation: 1173
Given the following statement as a string
stmt = "insert into table (col1, col2, col3) values (100, '() string with parantheses ()', 2.3);"
I am trying to find a regular expression (or any other way if there is a better one) to split that string into a list ["(col1, col2, col3)", "(100, '() string with parantheses ()', 2.3)"]
such that I can later fill a dicionary with column names and values
d = { "col1" : "100", "col2" : "'() string with parantheses ()'", "col3" : "2.3" }
So far I have the following solution that I don't like (or rather I belive there is a solution using regex only to do the same).
re.findall("\([^\r\n]*\)", stmt)[0].split("values")
# from here on I would have to parse the two strings and fill a dict
I am not able to find a solution where I don't have to split the string by 'values'
using regex only. My main problem is that the second parantheses string where the values are may contain parantheses themself in strings.
Upvotes: 2
Views: 2705
Reputation: 1093
Use blow code and enjoy it.
import re
import ast
def ParseInsert(sql):
cols, values = re.split("values", sql.rstrip(";"), flags=re.IGNORECASE)
cols = cols.strip(" ").rstrip(")").partition("(")[2].rstrip(")").split(",")
cols = [x.strip(" ").strip("`").strip("'").lstrip("[").rstrip("]") for x in cols]
values = values.strip(" ")
d = dict(zip(cols, ast.literal_eval(values)))
return d
sqls = [
"INSERT INTO TBL1(Name,Age)VALUES('Jon',10)",
"INSERT INTO TBL1 (Name, Age) VALUES ('Jon', 10)",
"INSERT INTO TBL1 ( Name , Age ) VALUES ( 'Jon' , 10 )",
"INSERT INTO [TBL1] ( [Name] , [Age] ) VALUES ( 'Jon' , 10 )",
"INSERT INTO 'TBL1' ( 'Name' , 'Age' ) VALUES ( 'Jon' , 10 )",
"INSERT INTO `TBL1` ( `Name` , `Age` ) VALUES ( 'Jon' , 10 )"
]
for sql in sqls:
print(sql)
print(ParseInsert(sql))
print()
Outputs:
INSERT INTO TBL1(Name,Age)VALUES('Jon',10)
{'Name': 'Jon', 'Age': 10}
INSERT INTO TBL1 (Name, Age) VALUES ('Jon', 10)
{'Name': 'Jon', 'Age': 10}
INSERT INTO TBL1 ( Name , Age ) VALUES ( 'Jon' , 10 )
{'Name': 'Jon', 'Age': 10}
INSERT INTO [TBL1] ( [Name] , [Age] ) VALUES ( 'Jon' , 10 )
{'Name': 'Jon', 'Age': 10}
INSERT INTO 'TBL1' ( 'Name' , 'Age' ) VALUES ( 'Jon' , 10 )
{'Name': 'Jon', 'Age': 10}
INSERT INTO `TBL1` ( `Name` , `Age` ) VALUES ( 'Jon' , 10 )
{'Name': 'Jon', 'Age': 10}
Upvotes: 0
Reputation: 78556
If you really want this, your regex will end up being complicated with assertion expressions:
don't have to split the string by 'values'
match = re.findall("""(?<!\') # No preceeding ' before (
\(
(?!\)) # A closing parenthesis must not follow a (
[^\r\n]*?
(?<!\() # An opening parenthesis must not precede a )
\)
(?!\') # No following ' immedaitely after )
""", stmt, re.VERBOSE)
# ['(col1, col2, col3)', "(100, '() string with parantheses ()', 2.3)"]
r = [o.strip() for i in match for o in i[1:-1].split(',')]
d = dict(zip(*r))
# {'col1': '100', 'col3': '2.3', 'col2': "'() string with parantheses ()'"}
You should go with the SQL solution for correctness
Upvotes: 0
Reputation: 50200
Why mess with these ugly hacks? Let SQL parse SQL. Here's a complete program to turn any insert statement into tuples:
my_insert = """insert into some_table (col1, col2, col3) values (100,
'() string with parantheses ()', 2.3);"""
import sqlite3
conn = sqlite3.connect(":memory:")
conn.execute("create table some_table (col1, col2, col3)")
conn.execute(my_insert)
parsed_rows = list(conn.execute("select * from some_table"))
conn.close()
print(parsed_rows)
# Output:
[(100, '() string with parantheses ()', 2.3)]
Of course you might also want to consider actually storing your data in a database, instead of whatever you're planning to do with them now. In that case, use a filename instead of ":memory:"
when establishing the connection and you'll get persistent storage.
Upvotes: 4
Reputation: 142156
If your statement is always in the same format, you can use some basic string operations and ast.literal_eval
to evaluate the values... Note that this will also end up with the values having types of int, str and float.
import ast
import csv
stmt = "insert into table (col1, col2, col3) values (100, '() string with parantheses ()', 2.3);"
pre, values = stmt.rstrip(';').partition(' values ')[::2]
cols = pre.partition('(')[2]
d = dict(zip(cols.rstrip(')').split(', '), ast.literal_eval(values)))
This'll give you:
{'col1': 100, 'col2': '() string with parantheses ()', 'col3': 2.3}
Upvotes: 4
Reputation: 15204
Well, this is not beautiful but assuming the string you work with will always be an insert statement (based on their characteristics), this should work:
stmt = "insert into table (col1, col2, col3) values (100, '() string with parantheses ()', 2.3);"
# if it will always be an insert statement, the following will work.
par1 = stmt[stmt.find("(") + 1:stmt.find(") values")]
par2 = stmt[stmt.find("values (") + 8:-2]
par1_list = par1.split(",")
par2_list = par2.split(",")
d = dict(zip(par1_list, par2_list))
print(d) # prints: {' col2': " '() string with parantheses ()'", ' col3': ' 2.3', 'col1': '100'}
If you have other insert statements, please try this and tell me if it works. Thanks.
Upvotes: 0