Terix
Terix

Reputation: 1387

Mysql: lock only one row for select

I have a kind of "ticket lottery distribution" problem. Each user that come to me need to get an unique code, taken from a table of available codes. Each user must get one and only one code, and each code must be given to one and only one user. The code that will be provided to the user is the first code available on the table that isn't flagged as "used".

This issue is very similar to this: mysql - Locking rows for select query?

But with a huge difference: I have measured accesses up to 18 users / seconds so far, so I need to lock only one row per user (the one I'm working on for each user). Locking the whole table could be an issue.

I've read that "READ COMMITTED" might be useful: https://www.percona.com/blog/2012/08/28/differences-between-read-committed-and-repeatable-read-transaction-isolation-levels/

but this is the first time I'm doing something like this and I'm a bit lost on how to proceed exactly and hot to test it later simulating this huge load before putting the code into production.

Upvotes: 0

Views: 2124

Answers (2)

elenst
elenst

Reputation: 3987

Since you only need one field from the table, maybe you could try something simpler than the referenced issue suggests. You don't really need a set of statements, you need an update on one table, which would be able to return a value from the row that it updated. So, this could actually work:

CREATE TABLE codes (code CHAR(16) PRIMARY KEY, used BOOL DEFAULT 0, INDEX(used));
CREATE TRIGGER pick_code AFTER UPDATE ON codes FOR EACH ROW SET @your_code = OLD.code;
--populate the table

Now, every user in their connection runs

UPDATE codes SET used = 1 WHERE used = 0 LIMIT 1;

And then this should return the chosen code:

SELECT @your_code;

It's atomic, so you don't need a transaction for that, or explicit table locking. Whether to make the table InnoDB and MyISAM should be decided empirically, based on comparative performance in your environment, as it can depend on many things which would be out of scope here.


Notes on integrity

Please note that it's just a stub, not a complete solution. In reality you'll need some more logic to ensure all of your 4 requirements:

  • only one user gets a code;
  • at least one user gets a code;
  • only one code is given to a user;
  • at least one code is given to a user.

The stub addresses the last point, the first and second ones are a matter of crash-safety (you should be able to ensure that with proper InnoDB settings, even though in other ways InnoDB will be inferior to MyISAM for this flow), and finally for the 3rd point you also need to store the information that the user has been given a code, but it depends on how your users are identified. E.g. it can be something like

CREATE TABLE codes (code CHAR(16) PRIMARY KEY, used BOOL DEFAULT 0, assignee VARCHAR(128), UNIQUE(assignee), INDEX(used));
CREATE TRIGGER pick_code BEFORE UPDATE ON codes FOR EACH ROW SET @your_code = OLD.code, NEW.assignee = CURRENT_USER();

(just another stub -- it can be done in a completely different way).


Update (notes on having an index for used column)

Since a question about the index on used was raised in the comments, I've run a quick informal benchmark. It's based on the solution above, but might also be worth considering with any other solutions which use similar structures and DML.

Disclaimers:

  • absolute values in the results are completely irrelevant, the tests were performed on a regular Debian desktop installation, not in any way tuned for benchmarking;
  • the results are not intended to prove that the suggested solution is good, only to check some points that were discussed;
  • the server was not InnoDB-tuned, one might achieve better performance with InnoDB tables with proper configuration, it's just a very rough comparison.

Test setup

  • MySQL server 5.6.34, 64-bit binary tarball from the official website
  • options: --innodb-buffer-pool-size=4G --innodb-flush-log-at-trx-commit=2, all other options are server defaults;
  • client tool: mysqlslap from the same package

Four tables are created. The structure is identical, apart from the engine (MyISAM vs InnoDB) and an index on used column (index vs no index).

MySQL [test]> show create table codes_innodb \G
*************************** 1. row ***************************
       Table: codes_innodb
Create Table: CREATE TABLE `codes_innodb` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MySQL [test]> show create table codes_innodb_i \G
*************************** 1. row ***************************
       Table: codes_innodb_i
Create Table: CREATE TABLE `codes_innodb_i` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`),
  KEY `used` (`used`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MySQL [test]> show create table codes_myisam \G
*************************** 1. row ***************************
       Table: codes_myisam
Create Table: CREATE TABLE `codes_myisam` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MySQL [test]> show create table codes_myisam_i \G
*************************** 1. row ***************************
       Table: codes_myisam_i
Create Table: CREATE TABLE `codes_myisam_i` (
  `code` char(17) NOT NULL DEFAULT '',
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`),
  KEY `used` (`used`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

Each table is populated with 50,000,000 rows of identical data (not similar, but actually identical).

Test flow

  • Two tests are performed on each table.
  • Each test is run with 20 concurrent threads, all performing the same update, 250 times in each thread, 5000 times total:

UPDATE codes_innodb SET used = 1 WHERE used = 0 LIMIT 1

  • First test is started when used=0 for all rows (the "initial" state).
  • Second test is started when used=1 for 1,005,000 rows.

The test measures the total amount of time to execute all queries.

Results (in seconds)

| Table              |  Test 1  |  Test 2  |
|--------------------|----------|----------|
| MyISAM with index  |   0.459  |    0.333 |
| MyISAM, no index   |   3.425  |  801.383 |
| InnoDB with index  |  11.529  |    8.205 |
| InnoDB, no index   |  19.646  | 2403.297 |

So, at the beginning results with or without index are comparable, even though with index they are somewhat better. However, when we have to go deeper into the data, results change essentially. With index, they remain approximately the same (ignore fluctuations on low values), but without index the further inside the data, the longer it takes.

It is quite expected, here is why. With index, regardless where we are, the UPDATE still performs only one key read and one rnd read:

MySQL [test]> select used, count(*) from codes_myisam_i group by used;
+------+----------+
| used | count(*) |
+------+----------+
|    0 | 48990000 |
|    1 |  1010000 |
+------+----------+
2 rows in set (12.08 sec)

MySQL [test]> flush status;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> update codes_myisam_i set used=1 where used=0 limit 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MySQL [test]> select * from information_schema.session_status where variable_name like 'Handler_read%' and variable_value > 0;
+------------------+----------------+
| VARIABLE_NAME    | VARIABLE_VALUE |
+------------------+----------------+
| HANDLER_READ_KEY | 1              |
| HANDLER_READ_RND | 1              |
+------------------+----------------+
2 rows in set (0.00 sec)

But without the index, it performs as many rnd reads as many rows have already been updated:

MySQL [test]> select used, count(*) from codes_myisam group by used;
+------+----------+
| used | count(*) |
+------+----------+
|    0 | 48990000 |
|    1 |  1010000 |
+------+----------+
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> flush status;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> update codes_myisam set used=1 where used=0 limit 1;
Query OK, 1 row affected (0.09 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MySQL [test]> select * from information_schema.session_status where variable_name like 'Handler_read%' and variable_value > 0;
+-----------------------+----------------+
| VARIABLE_NAME         | VARIABLE_VALUE |
+-----------------------+----------------+
| HANDLER_READ_RND_NEXT | 1010001        |
+-----------------------+----------------+
1 row in set (0.00 sec)

Of course, these results are very specific to this particular flow, when we perform numerous one-row updates and have to search for a row every time. So, obviously the penalty on the look-ups exceeds the penalty on updating the index. It would have been totally different if we performed a bulk update:

MySQL [test]> update codes_innodb set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (7.80 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

MySQL [test]> update codes_innodb_i set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (56.91 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

MySQL [test]> update codes_myisam set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (1.21 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

MySQL [test]> update codes_myisam_i set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (14.56 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

There, naturally, updating tables with the extra index is many times slower than updating tables without the index. I think that's where the confusion in the comments came from.


Update 2 (notes on using a natural PK vs surrogate PK)

Another objection that was raised in the comments is using the natural primary key, as opposed to surrogate primary key, the concern was that it would affect InnoDB performance. Here is a similar quick benchmark regarding this.

Test setup

Same environment and server as in the previous test. Two InnoDB tables are in use.

First is the same one as before, with the natural PK:

       Table: codes_innodb_i
Create Table: CREATE TABLE `codes_innodb_i` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`),
  KEY `used` (`used`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Another one with a surrogate PK (and with a unique index on code, since we still want to ensure it's unique -- in the first table, PK itself makes sure of that):

       Table: codes_innodb
Create Table: CREATE TABLE `codes_innodb` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`pk`),
  UNIQUE KEY `code` (`code`),
  KEY `used` (`used`)
) ENGINE=InnoDB AUTO_INCREMENT=50000001 DEFAULT CHARSET=latin1

50,000,000 rows in each table, identical data.

Test flow

  • The test starts when used=0 for all rows;
  • 10 consequent test runs are performed on each table.
  • Each run is 20 concurrent threads, all performing the same update, 250 times in each thread, 5000 times total:

UPDATE codes_innodb SET used = 1 WHERE used = 0 LIMIT 1

  • The table is not updated between the runs, that is, the first one starts with all used=0, the 2nd one starts with 5000 used=1 in the table, etc.

Each test measures the total amount of time to execute all queries.

Results (in seconds)

|              | Individual results                                           |    Avg |
|--------------|--------------------------------------------------------------|--------|
| natural PK   | 8.061,6.782,5.712, 5.524,7.854,6.166,6.095,4.911,4.435,4.784 | 6.0324 |
| surrogate PK | 9.659,8.981,8.080,11.257,9.621,6.722,6.457,5.937,6.308,6.624 | 7.9646 |

Even though the natural PK has shown somewhat better results, since the environment is not tuned, I wouldn't go as far as saying that natural PK is superior here, it's quite possible that with proper tuning of the server and using a better environment it would change. But we can see that there is no performance drop upon using natural PK vs surrogate PK for this workflow. So, it's rather a question of personal preferences.

Upvotes: 2

N.B.
N.B.

Reputation: 14091

You don't need to lock rows, ever.

When dealing with uniqueness, you place UNIQUE constraints. Always. No exception, EVER.

The key is not performance but data integrity. That's what databases are for - to contain VALID data so you can create projections, plans, etc.

It's not difficult to make it fast. Do not sacrifice data integrity for performance.

Disclaimer: I did not test any of the SQL here. Everything I posted serves as an educational example. It might work if you copy paste it, I am not guaranteeing for the syntactical correctness of any SQL shown here.

The problem

Each user must get one and only one code, and each code must be given to one and only one user

This defines uniqueness. There's one code. One user can only have one code. Let's design the model.

The solution

  • We'll save codes into one table
  • We'll use a junction table to link users with codes. Junction table is used so we don't have to alter users table for this approach to work
  • Codes will have a unique value (handled by UNIQUE constraint)
  • Junction table will allow only 1 code per user, but the same user will still be able to have other codes assigned to them, if need occurs later on

Using this approach:

  • We will be getting errors about duplicate entries
  • We will handle those errors

Why are we going to get errors? Because we'll hit unique constraints and database will simply deny the entry. But that's what we want and that's what we'll get. Locking the rows is not the solution to these types of problems. It's slower. You need to unlock the row(s) at some point. It's prone to concurrency issues. Database handles uniqueness infinitely times better than you or any other programmer can do. Therefore, we're going to use database's mechanisms.

The codes table

Codes table will contain codes. You didn't define how the code looks like so I'm going to assume it's going to be a string. I'll choose varchar(255) because I like it. Since code has to be unique, I'll help myself with a trigger a little bit and a unique constraint. I'll hash the value of the code using sha1, save it into binary(20) and make that binary column unique.

What I got out of this: - My codes can look like anything now, be a combination of any type of character - I always have fixed length of the index so I can place a unique constraint without fear - I can search the codes using the hash, but I can display a friendlier version to the customer - There can be only one code value in the database due to unique constraint

CREATE TABLE codes (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    code_value VARCHAR(255) NOT NULL, -- this is the "friendly" value that customers get
    code_hash binary(16) default null, -- this is the hash of the above value
    is_used tinyint not null default '0', -- a small helper field when querying for which codes are "free"
    primary key(id),
    unique(code_hash)
) ENGINE = InnoDB;

The trigger that handles hashing, so we don't have to provide value:

DELIMITER $$

CREATE
TRIGGER `codes_before_insert` BEFORE INSERT
ON `codes`
    FOR EACH ROW BEGIN
        SET NEW.code_hash = UNHEX(SHA1(NEW.code_value));
END$$

DELIMITER ;

The junction table

  1. Used so we don't have to alter any existing user-related tables
  2. Allows for one code per user
  3. Allows for one user to use multiple codes
  4. Contains foreign keys to enforce data integrity

The table:

CREATE TABLE user2code (
    id int unsigned not null auto_increment,
    user_id int unsigned not null,
    code_id int unsigned not null,
    unique(code_id), -- this part allows for only 1 code to be used, ever
    foreign key(user_id) references users(id) on delete cascade,
    foreign key(code_id) references codes(id) on delete cascade
) ENGINE = InnoDB;

We're going to place a utility trigger on junction table. When code is used, we'll update the codes table and set is_used to 1. This is done for easier navigation between free / occupied codes. We could have done this by JOIN-ing the user2code table onto codes table but we want to be a bit more performant.

DELIMITER $$

CREATE
TRIGGER `user2code_after_insert` AFTER INSERT
ON `user2code`
    FOR EACH ROW BEGIN
        UPDATE `codes` SET is_used = 1 WHERE id = NEW.code_id;
END$$

DELIMITER ;

How to use

INSERT INTO user2code 
(user_id, code_id) 
VALUES 
(1, (SELECT id FROM codes WHERE is_used = 0 LIMIT 1));

Outcomes:

  1. When successful, that's it!
  2. In case of an error, you handle the error by repeating the transaction. You can repeat the transaction a few times (3-4) times and if it fails, ask the user to try again later. This is tied to UX which I know nothing about in your case.

The key is knowing that failing transactions aren't bad. That's database's way of telling us hey, you can't do this, here's why.

Placing locks is dangerous - it doesn't guarantee uniqueness. Only unique constraints do.

Good luck!

Upvotes: 2

Related Questions