Auto-incremented values can appear out of order if inserted in transactions

Last week I was surprised to learn that auto-incrementing field values in MySQL don't always necessarily appear in sorted order to clients. That is, it's possible for a count of rows in a table to increase while the max of an auto-inc col remains unchanged. Auto-incremented values are created when the record is inserted, not necessarily when the transaction during which the record is inserted is committed.

One of our systems at work can replicate updated rows from a table given that a column in it contains strictly incrementing values denoting the order of updates. I.e., if our system observes a row where the replication column has value X, subsequent replication jobs will only query for rows with a replication column value >= X. This is useful for efficiently replicating tables with a modified_at column or an append-only table with an incrementing ID. Most of the time...

We have a user who was inserting rows very rapidly into one such source database. They were using an AUTO_INCREMENTing column as a replication key for our system. In general there's no issue with doing this. We found, however, that in this case, rows were being inserted in batches as part of transactions happening parallel.

I discovered that auto-incrementing values are created at insertion time, not at commit time. In the case of relatively long-running transactions that happen simultaneously, this can result in auto-inc field values being visible in an unpredictable order.

It's fairly easy to illustrate this problem. Here I use two MySQL connections to show what can happen when there are multiple transactions inserting into a table simultaneously.

First, create a table with an auto-inc field and insert a row.

mysql> CREATE TABLE animals (
    ->      id MEDIUMINT NOT NULL AUTO_INCREMENT,
    ->      name CHAR(30) NOT NULL,
    ->      PRIMARY KEY (id)
    -> );
Query OK, 0 rows affected (0.03 sec)

mysql> insert into animals (name) values ('aardvark');
Query OK, 1 row affected (0.02 sec)

Observe that a simple SELECT query returns the row, as expected. The id field starts at 1.

mysql> select * from animals;
+----+----------+
| id | name     |
+----+----------+
|  1 | aardvark |
+----+----------+
1 row in set (0.00 sec)

Now, start a new transaction and insert some more records:

mysql> start transaction ;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into animals (name) values ('bear'), ('cow'), ('dinosaur');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

In another terminal, connect to the database and select from the table. You'll see only the original row is visible. The three records inserted during the transaction have not yet been committed.

mysql> select * from animals;
+----+----------+
| id | name     |
+----+----------+
|  1 | aardvark |
+----+----------+
1 row in set (0.00 sec)

Now, still in the second terminal, start a separate transaction and insert three more records. This time, though, commit the transaction changes.

If you query the database after that commit, you'll see that the three most recently committed rows are visible and are returned by the query. Also note, importantly, that auto-incrementing id field values have incremented past the values that were inserted but not yet committed.

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into animals (name) values ('elephant'), ('frog'), ('groundhog');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from animals;
+----+-----------+
| id | name      |
+----+-----------+
|  1 | aardvark  |
|  5 | elephant  |
|  6 | frog      |
|  7 | groundhog |
+----+-----------+
4 rows in set (0.00 sec)

Back in the first terminal, you can commit the first transaction. Now all the records will be returned by a SELECT.

mysql> commit;
Query OK, 0 rows affected (0.01 sec)

mysql> select * from animals;
+----+-----------+
| id | name      |
+----+-----------+
|  1 | aardvark  |
|  2 | bear      |
|  3 | cow       |
|  4 | dinosaur  |
|  5 | elephant  |
|  6 | frog      |
|  7 | groundhog |
+----+-----------+
7 rows in set (0.00 sec)

I think the moral of the story is: don't rely on the order in which values in an AUTO_INCREMENT column appear. There could be rows in transactional flight with smaller values than the max of the auto-inc column.