ddevalco
ddevalco

Reputation: 1249

Iterate through JSON data, create dict/list and then insert into Postgresql database using psycopg2?

This is a multipart issue I am working on. Yes, I am quite new to Python, so I may not express things correctly.

Trying to figure out how I would accomplish iterating through some JSON data that I am pulling from an API to extract the values and then I believe I need to either create a dict or list of the data I want to then push "INSERT INTO" a PostgreSQL database.

This is the JSON data I pull from a device:

{u'service_group_stat': {u'cur_conns': 344,
                         u'member_stat_list': [{u'cur_conns': 66,
                                                u'req_bytes': 1476212423,
                                                u'req_pkts': 10449342,
                                                u'resp_bytes': 33132743858,
                                                u'resp_pkts': 25652317,
                                                u'server': u'WWW0006',
                                                u'tot_conns': 172226},
                                               {u'cur_conns': 64,
                                                u'req_bytes': 1666275823,
                                                u'req_pkts': 11982676,
                                                u'resp_bytes': 37575461036,
                                                u'resp_pkts': 29175599,
                                                u'server': u'WWW0005',
                                                u'tot_conns': 205244},
                                               {u'cur_conns': 89,
                                                u'req_bytes': 1671222671,
                                                u'req_pkts': 11940864,
                                                u'resp_bytes': 37064038202,
                                                u'resp_pkts': 28747313,
                                                u'server': u'WWW0004',
                                                u'tot_conns': 195789},
                                               {u'cur_conns': 37,
                                                u'req_bytes': 94117510958,
                                                u'req_pkts': 585916896,
                                                u'resp_bytes': 1860691638618,
                                                u'resp_pkts': 1439228725,
                                                u'server': u'WWW0003',
                                                u'tot_conns': 7366402},
                                               {u'cur_conns': 42,
                                                u'req_bytes': 98580368121,
                                                u'req_pkts': 642797814,
                                                u'resp_bytes': 1934241923560,
                                                u'resp_pkts': 1498242871,
                                                u'server': u'WWW0002',
                                                u'tot_conns': 7221995},
                                               {u'cur_conns': 46,
                                                u'req_bytes': 94886760323,
                                                u'req_pkts': 593577169,
                                                u'resp_bytes': 1863028601218,
                                                u'resp_pkts': 1441197389,
                                                u'server': u'WWW0001',
                                                u'tot_conns': 7260787}],
                         u'name': u'SG_SITE1.BUSINESS.COM_443',
                         u'req_bytes': 292398350319,
                         u'req_pkts': 1856664761,
                         u'resp_bytes': 5765734406492,
                         u'resp_pkts': 4462244214,
                         u'tot_conns': 22422443}}

It's stored as "data".

So then I thought something like this would be needed:

for row in data['service_group_stat']['member_stat_list']:
    SRVR_NAME = row['server']
    CURR_CONNS = row['cur_conns']
    TOTAL_CONNS = row['tot_conns']
    REQ_BYTES = row['req_bytes']
    REQ_PKTS = row['req_pkts']
    RESP_BYTES = row['resp_bytes']
    RESP_PKTS = row['resp_pkts']

The goal is to extract the metrics for each server in that list.

Essentially I am looking to extract the following and insert into a database:

SRVR_NAME  CURR_CONNS  TOT_CONNS  REQ_BYTES   REQ_PKTS  RESP_BYTES   RESP_PKTS
WWW00006    66         172226     1476212423  10449342  33132743858  25652317
WWW00005    64         205244     1666275823  11982676  37575461036  29175599
WWW00004    89         195789     1671222671  11940864  37064038202  28747313
WWW00003    37         7366402    94117510958 585916896 1860691638618 1439228725

I'm not sure if I should try to read the details for one server, create a dict based on the values for that single server and then insert one server at a time into the database?

Something like:

# Create DICT of server metrics from JSON data

metrics = ({"SRVR_NAME":'SRVR_NAME', "CURR_CONNS":'CURR_CONNS',
             "TOTAL_CONNS":'TOTAL_CONNS', "REQ_BYTES":'REQ_BYTES', "REQ_PKTS":'REQ_PKTS', "RESP_BYTES":'RESP_BYTES',
             "RESP_PKTS":'RESP_PKTS'})

# Execute storing metrics for server
# Table: metrics    Fields:(SRVR_NAME, CURR_CONNS, TOTAL_CONNS, REQ_BYTES, REQ_PKTS, RESP_BYTES, RESP_PKTS)

# Using psycopg2 to interact with PostgreSQL

cur.executemany("""INSERT INTO metrics("SRVR_NAME", "CURR_CONNS", "TOTAL_CONNS", "REQ_BYTES", 
                    "REQ_PKTS", "RESP_BYTES", "RESP_PKTS") VALUES (%(SRVR_NAME)s, %(DATE)s, %(TIME)s, %(CURR_CONNS)s, 
                    %(TOTAL_CONNS)s, %(REQ_BYTES)s, %(REQ_PKTS)s, %(RESP_BYTES)s, %(RESP_PKTS)s)""", metrics)

I think that's close to how I might be able to do the metrics for a single server at a time?

Haven't figured the whole thing out yet but I think it's close.

But the real question is going to be can I build a bigger list or dict containing all the details for each server and then doing a bulk insert? Or maybe I iterate through the new dict/list and insert through each pass / iteration of that dict/list?

Hope this makes sense. Just not good at concisely breaking this down and explaining.

Hoping to understand the most efficient way to accomplish this.


UPDATE #1

Using some of the code suggested I am trying to use the following:

now = datetime.datetime.now()
date = str(now.strftime("%m-%d-%Y"))
time = str(now.strftime("%H:%M:%S"))

# Define a DICT/LIST of ServiceGroup names that we will pull stats for

name = 'SG_ACCOUNT.BUSINESS.COM_443'

# Pull stats from A10 LB for ServiceGroup and store in memory
# Will want to eventually iterate through a DICT/LIST of SG names

data = c.slb.service_group.stats(name)

srv = """INSERT INTO server("SG_NAME", "SRVR_NAME") VALUES ('name', %(server)s)"""
argslist1 = data[u'service_group_stat'][u'member_stat_list']
psycopg2.extras.execute_batch(cur, srv, argslist1, page_size=100)

sql = """INSERT INTO metrics("SRVR_NAME", "DATE", "TIME", "CURR_CONNS", "TOTAL_CONNS", 
                             "REQ_BYTES", "REQ_PKTS", "RESP_BYTES", 
                             "RESP_PKTS") VALUES (%(server)s, 
             'date', 'time', %(curr_conns)s, %(total_conns)s, 
             %(req_bytes)s, %(req_pkts)s, %(resp_bytes)s, %(resp_pkts)s)"""
argslist2 = data[u'service_group_stat'][u'member_stat_list']
psycopg2.extras.execute_batch(cur, sql, argslist2, page_size=100)

However it's failing because I can't seem to pull in previously declared values such as:

date = str(now.strftime("%m-%d-%Y"))
time = str(now.strftime("%H:%M:%S"))
name = 'SG_ACCOUNT.BUSINESS.COM_443'

I get the following error when trying to insert the 'name'

DETAIL:  Key (SG_NAME)=(name) is not present in table "servicegroup".

I don't think it's actually pulling the value defined for 'name' and is instead trying to INSERT "name" which doesn't exist in the "servicegroup" table.

Upvotes: 0

Views: 1068

Answers (1)

mechanical_meat
mechanical_meat

Reputation: 169304

Looking at http://initd.org/psycopg/docs/extras.html#module-psycopg2.extras it says that .executemany is not the most performant. An alternative is execute_batch() in http://initd.org/psycopg/docs/extras.html#fast-execution-helpers

import psycopg2.extras
cur = conn.cursor()
sql = """INSERT INTO metrics("SRVR_NAME", "CURR_CONNS", "TOTAL_CONNS", 
                             "REQ_BYTES", "REQ_PKTS", "RESP_BYTES", 
                             "RESP_PKTS") VALUES (%(srvr_name)s, 
             %(date)s, %(time)s, %(curr_conns)s, %(total_conns)s, 
             %(req_bytes)s, %(req_pkts)s, %(resp_bytes)s, %(resp_pkts)s)"""
argslist = data[u'service_group_stat'][u'member_stat_list']
psycopg2.extras.execute_batch(cur, sql, argslist, page_size=100)

Here argslist is gotten by data[u'service_group_stat'][u'member_stat_list'] and looks like:

[{u'cur_conns': 66,
  u'req_bytes': 1476212423,
  u'req_pkts': 10449342,
  u'resp_bytes': 33132743858L,
  u'resp_pkts': 25652317,
  u'server': u'WWW0006',
  u'tot_conns': 172226},
 {u'cur_conns': 64,
  u'req_bytes': 1666275823,
  u'req_pkts': 11982676,
  u'resp_bytes': 37575461036L,
  u'resp_pkts': 29175599,
  u'server': u'WWW0005',
  u'tot_conns': 205244},
 {u'cur_conns': 89,
  u'req_bytes': 1671222671,
  u'req_pkts': 11940864,
  u'resp_bytes': 37064038202L,
  u'resp_pkts': 28747313,
  u'server': u'WWW0004',
  u'tot_conns': 195789},
 {u'cur_conns': 37,
  u'req_bytes': 94117510958L,
  u'req_pkts': 585916896,
  u'resp_bytes': 1860691638618L,
  u'resp_pkts': 1439228725,
  u'server': u'WWW0003',
  u'tot_conns': 7366402},
 {u'cur_conns': 42,
  u'req_bytes': 98580368121L,
  u'req_pkts': 642797814,
  u'resp_bytes': 1934241923560L,
  u'resp_pkts': 1498242871,
  u'server': u'WWW0002',
  u'tot_conns': 7221995},
 {u'cur_conns': 46,
  u'req_bytes': 94886760323L,
  u'req_pkts': 593577169,
  u'resp_bytes': 1863028601218L,
  u'resp_pkts': 1441197389,
  u'server': u'WWW0001',
  u'tot_conns': 7260787}]

Upvotes: 1

Related Questions