Reputation: 543
So I found a great script over at QuantState that had a great walk-through on setting up my own securities database and loading in historical pricing information. However, I'm not trying to modify the script so that I can run it daily and have the latest stock quotes added.
I adjusted the initial data load to just download 1 week worth of historicals, but I've been having issues with writing the SQL statement to see if the row exists already before adding. Can anyone help me out with this. Here's what I have so far:
def insert_daily_data_into_db(data_vendor_id, symbol_id, daily_data):
"""Takes a list of tuples of daily data and adds it to the
database. Appends the vendor ID and symbol ID to the data.
daily_data: List of tuples of the OHLC data (with
adj_close and volume)"""
# Create the time now
now = datetime.datetime.utcnow()
# Amend the data to include the vendor ID and symbol ID
daily_data = [(data_vendor_id, symbol_id, d[0], now, now,
d[1], d[2], d[3], d[4], d[5], d[6]) for d in daily_data]
# Create the insert strings
column_str = """data_vendor_id, symbol_id, price_date, created_date,
last_updated_date, open_price, high_price, low_price,
close_price, volume, adj_close_price"""
insert_str = ("%s, " * 11)[:-2]
final_str = "INSERT INTO daily_price (%s) VALUES (%s) WHERE NOT EXISTS (SELECT 1 FROM daily_price WHERE symbol_id = symbol_id AND price_date = insert_str[2])" % (column_str, insert_str)
# Using the postgre connection, carry out an INSERT INTO for every symbol
with con:
cur = con.cursor()
cur.executemany(final_str, daily_data)
Upvotes: 0
Views: 2297
Reputation: 15316
Some notes regarding your code above:
It's generally easier to defer to now()
in pure SQL than to try in Python whenever possible. It avoids lots of potential pitfalls with timezones, library differences, etc.
If you construct a list of columns, you can dynamically generate a string of %s
's based on its size, and don't need to hardcode the length into a repeated string with is then sliced.
Since it appears that insert_daily_data_into_db
is meant to be called from within a loop on a per-stock basis, I don't believe you want to use executemany
here, which would require a list of tuples and is very different semantically.
You were comparing symbol_id to itself in the sub select
, instead of a particular value (which would mean it's always true).
To prevent possible SQL Injection, you should always interpolate values in the WHERE
clause, including sub select
s.
Note: I'm assuming that you're using psycopg2 to access Postgres, and that the primary key for the table is a tuple of (symbol_id, price_date)
. If not, the code below would need to be tweaked at least a bit.
With those points in mind, try something like this (untested, since I don't have your data, db, etc. but it is syntactically valid Python):
def insert_daily_data_into_db(data_vendor_id, symbol_id, daily_data):
"""Takes a list of tuples of daily data and adds it to the
database. Appends the vendor ID and symbol ID to the data.
daily_data: List of tuples of the OHLC data (with
adj_close and volume)"""
column_list = ["data_vendor_id", "symbol_id", "price_date", "created_date",
"last_updated_date", "open_price", "high_price", "low_price",
"close_price", "volume", "adj_close_price"]
insert_list = ['%s'] * len(column_str)
values_tuple = (data_vendor_id, symbol_id, daily_data[0], 'now()', 'now()', daily_data[1],
daily_data[2], daily_data[3], daily_data[4], daily_data[5], daily_data[6])
final_str = """INSERT INTO daily_price ({0})
VALUES ({1})
WHERE NOT EXISTS (SELECT 1
FROM daily_price
WHERE symbol_id = %s
AND price_date = %s)""".format(', '.join(column_list), ', '.join(insert_list))
# Using the postgre connection, carry out an INSERT INTO for every symbol
with con:
cur = con.cursor()
cur.execute(final_str, values_tuple, values_tuple[1], values_tuple[2])
Upvotes: 1