VISQL
VISQL

Reputation: 2038

Use Python to parse multiple json items into a SQLite database

I'm looking for a non-explicit way to parse a JSON list (i.e. [] with {} items inside) into a sqlite database.

Specifically, I'm stuck at point where I want to say

INSERT into MYTABLE (col 1, col2, ...) this datarow in this jsondata

I feel that there should be a way to abstract things such that the line above is almost what it takes. My data is a JSON list, containing a multiple JSON dictionaries. Each dictionary has a dozen or so key:value pairs. Nothing nested.

{"object_id":3          ,"name":"rsid"          ,"column_id":1
,"system_type_id":127   ,"user_type_id":127     ,"max_length":8
,"precision":19         ,"scale":0              ,"collation_name":null
,"is_nullable":false    ,"is_ansi_padded":false ,"is_rowguidcol":false
,"is_identity":false    ,"is_computed":false    ,"is_filestream":false
,"is_replicated":false  ,"is_non_sql_subscribed":false
,"is_merge_published":false                     ,"is_dts_replicated":false
,"is_xml_document":false                        ,"xml_collection_id":0
,"default_object_id":0  ,"rule_object_id":0     ,"is_sparse":false
,"is_column_set":false}

Every single itesm in the json list [{k1:v1a, k2:v2a}, {k1:v1b,k2:v2b},...] will have the exact same KEY names. I've taken these as the names of the columns in my sqlite database. The VALUES will be different though. Hence a table is filled by getting the VALUES for each KEY/COLUMN into that row for that item.

k1  | k2  | k3  | ... | km
v1a | v2a | v3a | ... | vma
v1b | v2b | v3b | ... | vmb
...
v1n | v2n | v3n | ... | vmn

IN SQL, insert statements need not be written in the exact same order as the columns of the database. This is because you specify the columns you are inserting for (and thus the order) in the INSERT declaration. This seems to be perfect for JSON, in which every single row/item in the JSON list contains its column-name (the key). Therefore, I want a statement that says "given this row of JSON, insert all of its data into the SQL table by coordinating the JSON key name with the SQL table column name". This is what I mean by non-explicit.

import json

r3 = some data file you read and close
r4 = json.loads(r3)
# let's dump this into SQLite

import sqlite3

the_database = sqlite3.connect("sys_col_database.sqlite")
the_cursor = the_database.cursor()

row_keys = r4[0].keys()
# all of the key are below for reference. 25 total keys.
'''
'is_merge_published',   'rule_object_id',   'system_type_id',
'is_xml_document',      'user_type_id',     'is_ansi_padded',
'column_id',            'is_column_set',    'scale',
'is_dts_replicated',    'object_id',        'xml_collection_id',
'max_length',           'collation_name',   'default_object_id',
'is_rowguidcol',        'precision',        'is_computed',
'is_sparse',            'is_filestream',    'name',
'is_nullable',          'is_identity',      'is_replicated',
'is_non_sql_subscribed'
'''

sys_col_table_statement = """create table sysColumns (
    is_merge_published text,
    rule_object_id integer,
    system_type_id integer,
    is_xml_document text,
    user_type_id integer,
    is_ansi_padded text,
    column_id integer,
    is_column_set text,
    scale integer,
    is_dts_replicated text,
    object_id integer,
    xml_collection_id integer,
    max_length integer,
    collation_name text,
    default_object_id integer,
    is_rowguidcol text,
    precision integer,
    is_computed text,
    is_sparse text,
    is_filestream text,
    name text,
    is_nullable text,
    is_identity text,
    is_replicated text,
    is_non_sql_subscribed text
    )
"""

the_cursor.execute(sys_col_table_statement)

insert_statement = """insert into sysColumns values (
{0},{1},{2},{3},{4},
{5},{6},{7},{8},{9},
{10},{11},{12},{13},{14},
{15},{16},{17},{18},{19},
{20},{21},{22},{23},{24})""".format(*r4[0].keys())

This where I am stuck. insert_statement is a construction of the string that will be executed. Now I need to execute it, but feed it the right data from each JSON item in r4. I'm not sure how to write that.

Upvotes: 1

Views: 1940

Answers (1)

Yuri L
Yuri L

Reputation: 441

Here are two options. I also loaded the JSON data in one step.

#python 3.4.3

import json
import sqlite3

with open("data.json",'r') as f:
    r4 = json.load(f)

the_database = sqlite3.connect("sys_col_database.sqlite")
the_cursor = the_database.cursor()

row_keys = list(r4[0].keys())
# all of the key are below for reference. 25 total keys.
'''
'is_merge_published',   'rule_object_id',   'system_type_id',
'is_xml_document',      'user_type_id',     'is_ansi_padded',
'column_id',            'is_column_set',    'scale',
'is_dts_replicated',    'object_id',        'xml_collection_id',
'max_length',           'collation_name',   'default_object_id',
'is_rowguidcol',        'precision',        'is_computed',
'is_sparse',            'is_filestream',    'name',
'is_nullable',          'is_identity',      'is_replicated',
'is_non_sql_subscribed'
'''

sys_col_table_statement = """create table sysColumns (
is_merge_published text,
rule_object_id integer,
system_type_id integer,
is_xml_document text,
user_type_id integer,
is_ansi_padded text,
column_id integer,
is_column_set text,
scale integer,
is_dts_replicated text,
object_id integer,
xml_collection_id integer,
max_length integer,
collation_name text,
default_object_id integer,
is_rowguidcol text,
precision integer,
is_computed text,
is_sparse text,
is_filestream text,
name text,
is_nullable text,
is_identity text,
is_replicated text,
is_non_sql_subscribed text
)
"""
# method 1
# create a string with the keys prepopulated
prepared_insert_statement = """insert into sysColumns (\
{0}, {1}, {2}, {3}, {4}, \
{5}, {6}, {7}, {8}, {9}, \
{10}, {11}, {12}, {13}, {14}, \
{15}, {16}, {17}, {18}, {19}, \
{20}, {21}, {22}, {23}, {24}) values (\
'{{{0}}}', '{{{1}}}', '{{{2}}}', '{{{3}}}', '{{{4}}}', \
'{{{5}}}', '{{{6}}}', '{{{7}}}', '{{{8}}}', '{{{9}}}', \
'{{{10}}}', '{{{11}}}', '{{{12}}}', '{{{13}}}', '{{{14}}}', \
'{{{15}}}', '{{{16}}}', '{{{17}}}', '{{{18}}}', '{{{19}}}', \
'{{{20}}}', '{{{21}}}', '{{{22}}}', '{{{23}}}', '{{{24}}}'\
)""".format(*r4[0].keys())

# unpack the dictionary to extract the values
insert_statement = prepared_insert_statement.format(**r4[0])

# method 2
# get the keys and values in one step
r4_0_items = list(r4[0].items())
insert_statement2 = """insert into sysColumns (\
{0[0]}, {1[0]}, {2[0]}, {3[0]}, {4[0]}, \
{5[0]}, {6[0]}, {7[0]}, {8[0]}, {9[0]}, \
{10[0]}, {11[0]}, {12[0]}, {13[0]}, {14[0]}, \
{15[0]}, {16[0]}, {17[0]}, {18[0]}, {19[0]}, \
{20[0]}, {21[0]}, {22[0]}, {23[0]}, {24[0]}) values (\
'{0[1]}', '{1[1]}', '{2[1]}', '{3[1]}', '{4[1]}', \
'{5[1]}', '{6[1]}', '{7[1]}', '{8[1]}', '{9[1]}', \
'{10[1]}', '{11[1]}', '{12[1]}', '{13[1]}', '{14[1]}', \
'{15[1]}', '{16[1]}', '{17[1]}', '{18[1]}', '{19[1]}', \
'{20[1]}', '{21[1]}', '{22[1]}', '{23[1]}', '{24[1]}'\
)""".format(*r4_0_items)


the_cursor.execute(sys_col_table_statement)
#the_cursor.execute(insert_statement)
the_cursor.execute(insert_statement2)
the_database.commit()

For reference:
unpack dictionary
string formatting examples

Upvotes: 2

Related Questions