BCR
BCR

Reputation: 982

MySQL and Python is generating a duplicate entry error I cannot resolve

I'm new to MySQL but have a solid foundation in Python. After researching this extensively over the last 2 days including reading many stackoverflow questions and answers, I still haven't been able to resolve the issue so any help specific to this problem would be appreciated. UPDATED: The error is posted below.

I am trying to create a database which retrieves daily price data from yahoo and inputs the data into the corresponding table.

The MySQL tables and database were created using the MySQL Workbench 6.1. I'm using Python 2.7 Anaconda distribution on Windows 8.1 64 bit.

Here is the MySQL table:

    -- -----------------------------------------------------
-- Table `securities_master_00`.`daily_price`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `securities_master_00`.`daily_price` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `data_vendor_id` INT NOT NULL,
  `symbol_id` INT NOT NULL,
  `price_date` DATETIME NOT NULL,
  `created_date` DATETIME NOT NULL,
  `last_updated_date` DATETIME NOT NULL,
  `open_price` DECIMAL(19,4) NULL DEFAULT NULL,
  `high_price` DECIMAL(19,4) NULL DEFAULT NULL,
  `low_price` DECIMAL(19,4) NULL DEFAULT NULL,
  `close_price` DECIMAL(19,4) NULL DEFAULT NULL,
  `adj_close_price` DECIMAL(19,4) NULL DEFAULT NULL,
  `volume` BIGINT NULL DEFAULT NULL,
  INDEX `index_data_vendor_id_idx` (`data_vendor_id` ASC),
  PRIMARY KEY (`id`),
  INDEX `index_symbol_id_idx` (`symbol_id` ASC),
  CONSTRAINT `index_data_vendor_id`
    FOREIGN KEY (`data_vendor_id`)
    REFERENCES `securities_master_00`.`data_vendor` (`id`)
    ON DELETE NO ACTION
    ON UPDATE CASCADE,
  CONSTRAINT `index_symbol_id`
    FOREIGN KEY (`symbol_id`)
    REFERENCES `securities_master_00`.`symbol` (`id`)
    ON DELETE NO ACTION
    ON UPDATE CASCADE)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;

Here is the corresponding python code that generates the error:

def insert_daily_data_into_db(data_vendor_id, symbol_id, daily_data):
    '''
    Takes a list of tuples of daily data and adds it to the MySQL database.
    Appends the vendor ID and symbol ID to the data.

    daily_data: List of tuples of the OHLC data (with adj_close and volume)
    '''

    # Create the time now
    now = datetime.datetime.utcnow()

    # Amend the data to include the vendor ID and symbol ID
    daily_data = [(data_vendor_id, symbol_id, d[0], now, now, 
                   d[1], d[2], d[3], d[4], d[5], d[6]) for d in daily_data]

    # Create the insert strings 
    column_str = '''data_vendor_id, symbol_id, price_date, created_date,                    
                last_updated_date, open_price, high_price, low_price,
                close_price, volume, adj_close_price'''
    insert_str = ('%s, ' * 11)[:-2]
    final_str = 'INSERT INTO daily_price (%s) VALUES (%s)' % \
                (column_str, insert_str)

    # Using the MySQL connection, carry out an INSERT INTO for every symbol
    with con:
        cur = con.cursor()
        cur.executemany(final_str, daily_data)

if __name__ == '__main__':
    # Loop over the tickers and insert the daily historical data into the database
    tickers = obtain_list_of_db_tickers()
    for t in tickers:
        print 'Adding data for %s' % t[1]  
        yf_data = get_daily_historic_data_yahoo(t[1])
        insert_daily_data_into_db('1', t[0], yf_data) # I believe the error is here relating to the data vendor id # but am unclear on the method to solve the problem

ERROR CODE:

    Adding data for ABT
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Owner\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 585, in runfile
    execfile(filename, namespace)
  File "C:/Users/Owner/Documents/Python Scripts/price_retrieval_mine.py", line 99, in <module>
    insert_daily_data_into_db('1', t[0], yf_data)
  File "C:/Users/Owner/Documents/Python Scripts/price_retrieval_mine.py", line 91, in insert_daily_data_into_db
    cur.executemany(final_str, daily_data)
  File "C:\Users\Owner\Anaconda\lib\site-packages\MySQLdb\cursors.py", line 262, in executemany
    r = self._query('\n'.join([query[:p], ',\n'.join(q), query[e:]]))
  File "C:\Users\Owner\Anaconda\lib\site-packages\MySQLdb\cursors.py", line 354, in _query
    rowcount = self._do_query(q)
  File "C:\Users\Owner\Anaconda\lib\site-packages\MySQLdb\cursors.py", line 318, in _do_query
    db.query(q)
_mysql_exceptions.IntegrityError: (1062, "Duplicate entry '1' for key 'data_vendor_id_UNIQUE'")

UPDATED output using SQL command: SHOW INDEXES FROM securities_master_00.daily_price

    Table   Non_unique  Key_name    Seq_in_index    Column_name Collation   Cardinality Sub_part         Packed Null    Index_type
    daily_price 0   PRIMARY 1   id  A   2   NULL    NULL        BTREE
    daily_price 0   data_vendor_id_UNIQUE   1   data_vendor_id  A   2   NULL    NULL        BTREE
    daily_price 0   symbol_id_UNIQUE    1   symbol_id   A   2   NULL    NULL        BTREE
    daily_price 1   index_data_vendor_id_idx    1   data_vendor_id  A   2   NULL    NULL        BTREE
    daily_price 1   index_symbol_id_idx 1   symbol_id   A   2   NULL    NULL        BTREE

Upvotes: 0

Views: 1112

Answers (1)

Air
Air

Reputation: 8615

As the SHOW INDEXES statement result indicates, there are five indexes on your table, though only three are declared in your CREATE TABLE statement. The two extra indexes are UNIQUE indexes on your foreign key columns, which is a problem because you need to have a many-to-one relationship between the daily_price table and either of the data_vendor and symbol tables. This reflects the fact that many prices will be generated by the same vendor and, over some period of time, for the same symbols.

You need to DROP both of these extra indexes - or alternatively, DROP the daily_price table and recreate it using the table definition that you posted in this question - in order to stop throwing an IntegrityError when you try to insert rows into the table.

Put more plainly, the data_vendor_id_UNIQUE index on the table prevents you from ever having two rows in that table with the same data_vendor_id. Since every row you insert with data from Yahoo has data_vendor_id = 1, according to the last line of your Python code (presumably this corresponds with Yahoo's entry in the data_vendor table), the second row you try to insert violates the unique constraint of that index and produces the error you see here.

It would be a good idea for you to try to figure out where these extra indexes came from, especially if you're working with someone else on this project or using someone else's code. It's possible there are less obvious problems hiding behind this error.

Finally, it will be well worth your time to learn about indexes, how they work, and when to use them, if you plan to do any serious work with MySQL. You should try to become familiar with statements like SHOW INDEXES and especially EXPLAIN when it comes to query execution plans, so that you can diagnose errors like this one quickly and easily.

Upvotes: 1

Related Questions