Reputation: 8837
I am reading in some raw files and inserting them into a DB. This will involve MILLIONS of records, and for each record, I have multiple inserts (many tables). When I was testing locally, it was going quickly, but for the whole dataset, I need to work with a remote database. It is painfully slow this way, which I assume is because of all the trips over the network to delete/insert.
I am using the MySQLdb module (python) and at the moment, I have things such as the following:
# setup connection
con = mdb.connect('remote.host', 'database_user', '123456789', 'database_name');
... read files, loop through records, etc...
# clear out data related to current record
cur.execute("DELETE FROM articles WHERE article_id = %s", article.id)
cur.execute("DELETE FROM authors WHERE article_id = %s", article.id)
cur.execute("DELETE FROM addresses WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_references WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_patents WHERE article_id = %s", article.id)
# insert the article
cur.execute("INSERT INTO articles (article_id, doctype, keywords, language, title) VALUES (%s, %s, %s, %s, %s, %s)" , (article.id, article.doctype, ';'.join(article.keywords), article.language, article.title))
# insert all the authors
for au in article.authors:
cur.execute("INSERT INTO isi_authors (article_id, name_first, name_last, email) VALUES (%s, %s, %s, %s)", (article.id, au.first_name, au.last_name, au.email))
... other loops like the authors to insert 10-20 citations per article, multiple addresses, etc ...
From what I can tell, MySQLdb does not allow me to send multiple queries at once. There must be a way for me to avoid the network delays. Any ideas?
Upvotes: 0
Views: 1508
Reputation: 6632
At least MySQLdb
1.2.3 seems to allow multiple queries out of the box, you just have to call cursor.nextset()
to cycle through the returned result sets.
db = conn.cursor()
db.execute('SELECT 1; SELECT 2;')
more = True
while more:
print db.fetchall()
more = db.nextset()
If you want to be absolutely sure the support for this is enabled, and/or disable the support, you can use something like this:
MYSQL_OPTION_MULTI_STATEMENTS_ON = 0
MYSQL_OPTION_MULTI_STATEMENTS_OFF = 1
conn.set_server_option(MYSQL_OPTION_MULTI_STATEMENTS_ON)
# Multiple statement execution here...
conn.set_server_option(MYSQL_OPTION_MULTI_STATEMENTS_OFF)
If there's an error while executing one of the queries, mysql won't execute any queries after that point. The db.execute()
call will throw the exception if it originated from the first query, otherwise the appropriate db.nextset()
will do it, so you can fetch the result sets from the successfull queries before getting the exception.
Upvotes: 4
Reputation: 838276
Use executemany
. Here's an example from the manual:
c.executemany(
"""INSERT INTO breakfast (name, spam, eggs, sausage, price)
VALUES (%s, %s, %s, %s, %s)""",
[
("Spam and Sausage Lover's Plate", 5, 1, 8, 7.95 ),
("Not So Much Spam Plate", 3, 2, 0, 3.95 ),
("Don't Wany ANY SPAM! Plate", 0, 4, 3, 5.95 )
] )
In your case it would look something like this:
sql = "INSERT INTO isi_authors (article_id, name_first, name_last, email) VALUES (%s, %s, %s, %s)"
params = [(article.id, au.first_name, au.last_name, au.email) for au in article.authors]
cur.executemany(sql, params)
From the documentation of executemany
:
This method improves performance on multiple-row INSERT and REPLACE. Otherwise it is equivalent to looping over args with execute().
Upvotes: 1
Reputation: 2871
mySQL INSERT syntax does allow it. Compare 1) and 2)
1. INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);
2. INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
In the second case you're inserting three rows at once.
http://dev.mysql.com/doc/refman/5.5/en/insert.html
Hopefully it will give you ideas.
PS: this is language-independent way
Upvotes: 1