JDGD
JDGD

Reputation: 543

Update PostgreSQL database with daily stock prices in Python

So I found a great script over at QuantState that had a great walk-through on setting up my own securities database and loading in historical pricing information. However, I'm not trying to modify the script so that I can run it daily and have the latest stock quotes added.

I adjusted the initial data load to just download 1 week worth of historicals, but I've been having issues with writing the SQL statement to see if the row exists already before adding. Can anyone help me out with this. Here's what I have so far:

def insert_daily_data_into_db(data_vendor_id, symbol_id, daily_data):
  """Takes a list of tuples of daily data and adds it to the
   database. Appends the vendor ID and symbol ID to the data.

  daily_data: List of tuples of the OHLC data (with 
  adj_close and volume)"""

  # Create the time now
  now = datetime.datetime.utcnow()

  # Amend the data to include the vendor ID and symbol ID
  daily_data = [(data_vendor_id, symbol_id, d[0], now, now,
    d[1], d[2], d[3], d[4], d[5], d[6]) for d in daily_data]

  # Create the insert strings
  column_str = """data_vendor_id, symbol_id, price_date, created_date, 
          last_updated_date, open_price, high_price, low_price, 
          close_price, volume, adj_close_price"""
  insert_str = ("%s, " * 11)[:-2]
  final_str = "INSERT INTO daily_price (%s) VALUES (%s) WHERE NOT EXISTS (SELECT 1 FROM daily_price WHERE symbol_id = symbol_id AND price_date = insert_str[2])" % (column_str, insert_str)

  # Using the postgre connection, carry out an INSERT INTO for every symbol
  with con: 
    cur = con.cursor()
    cur.executemany(final_str, daily_data)

Upvotes: 0

Views: 2297

Answers (1)

khampson
khampson

Reputation: 15316

Some notes regarding your code above:

It's generally easier to defer to now() in pure SQL than to try in Python whenever possible. It avoids lots of potential pitfalls with timezones, library differences, etc.

If you construct a list of columns, you can dynamically generate a string of %s's based on its size, and don't need to hardcode the length into a repeated string with is then sliced.

Since it appears that insert_daily_data_into_db is meant to be called from within a loop on a per-stock basis, I don't believe you want to use executemany here, which would require a list of tuples and is very different semantically.

You were comparing symbol_id to itself in the sub select, instead of a particular value (which would mean it's always true).

To prevent possible SQL Injection, you should always interpolate values in the WHERE clause, including sub selects.

Note: I'm assuming that you're using psycopg2 to access Postgres, and that the primary key for the table is a tuple of (symbol_id, price_date). If not, the code below would need to be tweaked at least a bit.

With those points in mind, try something like this (untested, since I don't have your data, db, etc. but it is syntactically valid Python):

def insert_daily_data_into_db(data_vendor_id, symbol_id, daily_data):
    """Takes a list of tuples of daily data and adds it to the
    database. Appends the vendor ID and symbol ID to the data.

    daily_data: List of tuples of the OHLC data (with
    adj_close and volume)"""


    column_list = ["data_vendor_id", "symbol_id", "price_date", "created_date",
                    "last_updated_date", "open_price", "high_price", "low_price",
                    "close_price", "volume", "adj_close_price"]

    insert_list = ['%s'] * len(column_str)

    values_tuple = (data_vendor_id, symbol_id, daily_data[0], 'now()', 'now()', daily_data[1],
                    daily_data[2], daily_data[3], daily_data[4], daily_data[5], daily_data[6])

    final_str = """INSERT INTO daily_price ({0})
                     VALUES ({1})
                     WHERE NOT EXISTS (SELECT 1
                                       FROM daily_price
                                       WHERE symbol_id = %s
                                         AND price_date = %s)""".format(', '.join(column_list), ', '.join(insert_list))

    # Using the postgre connection, carry out an INSERT INTO for every symbol
    with con:
        cur = con.cursor()
        cur.execute(final_str, values_tuple, values_tuple[1], values_tuple[2])

Upvotes: 1

Related Questions