hackartist
hackartist

Reputation: 5264

Tricky MySQL batch query

I have a database of events which gets updated every night. A single event has information across three (or more) tables. Recently the volume of updates has caused my MySQL engine to be very slow to the point that my other queries get frozen while new events are being inserted. In order to speed things up I want to make a series of batch queries instead of having to do each one separately which I feel like is a large part of the overhead.

The problem is because the data is spread across several tables this is what I have to do to insert a single event:

   (in Mysql) INSERT INTO 'locations' (...) VALUES (...) ON DUPLICATE KEY UPDATE ...
   (in php get the last inserted id into variable $locationID)
   (in Mysql) INSERT INTO 'event_info' (...) VALUES ($locationID, ...) ON DUPLICATE KEY UPDATE...
   (in php get the last inserted id into variable $eventID)
   (in Mysql) INSERT INTO 'event_times' (...) VALUES ($eventID, ...) ON DUPLICATE KEY UPDATE...

I'm not looking for help in designing the tables, but as you can see to insert a single event requires at least three inserts each of which depends on getting the id from the previous one. This is why I didn't know where to begin for making this into a batch request. Any help for designing this process into a batch request would be awesome, thanks!

EDIT: I might already have the location or the event info previously and that is why the ON DUPLICATE KEY UPDATE part is there so that if it was already in the database I get the old id. I don't know until the insert if it is new data or if it already exists in the database. (because of that unless I am misunderstanding, I can't do things that preallocate the ids since this assume a new id every time.)

Upvotes: 2

Views: 1714

Answers (2)

Mike Chamberlain
Mike Chamberlain

Reputation: 407

Don't use auto incrementing columns - preallocate your reference id's before inserting. That way you can use a bulk insert and remove the dependency.

Update:

  1. Select any existing id's out of the database (ideally a single select for all known data).

  2. Enrich data to insert with any known id's. ( calculate a key for each item, which would correspond with the primary key for your table in the database, use that to update the item with the id from the database ) - you want to ultimately split the data into items which you know about in the database, and thus have an known id - and data which doesn't exist in the database, and thus needs a key allocating. I'm assuming your table has a primary key which isn't just the id - otherwise how else would the database know you already have the data in the database.

  3. Allocate new id's to any records without an id.

  4. bulk replace data in the database ( inserting multiple lines with a single statement ).

Upvotes: 2

Victor Sorokin
Victor Sorokin

Reputation: 12006

Don't know about details MySQL, but any self-respecting RDBMS has so-called "sequence" entities which are meant to be used as source of unique values suitable for tables' primary keys.

Below is approach to solve your problem using sequences, the one I encountered multiple times. Using pseudo-code:

  1. Start transaction
  2. Select 3 next values from some sequence. Again, any self-respecting RDBMS will guarantee that every query for "next value of sequence" will return unique value, suitable to be used as a primary key. Three selected values will be for primary keys of new records in locations, event_info, event_times;
  3. Perform INSERT INTO locations using value from 1st step;
  4. Perform INSERT INTO event_info using values from 1st step;
  5. Perform INSERT INTO event_times using values from 1st step;
  6. COMMIT transaction, if all was well. Otherwise, ROLLBACK transaction.

It's essential to make all INSERTs in one transaction. For further enhancements, you can batch your queries.

UPDATE To comply with your requirement about possible pre-existence of data prior to INSERTs

If your incoming updates are always contain whole set of data: that is location, event_info and event_times: Then use above approach and just delete old instances of records in tables. This assumes that you can SELECT old instances using some data other than primary key (that's data called domain-level primary key). Don't forget to make DELETE in the same transaction!

Upvotes: 2

Related Questions