Reputation: 586
Software versions:
PHP 8.1.5 (cli)
mysql Ver 8.0.29-0ubuntu0.20.04.3 for Linux on x86_64 ((Ubuntu))
After migrating our database to the new server and new software I noticed strange behaviour which could be shown with this simplified code snippet below:
<?php
$host = '<remote host (external IP)>';
$db = '<db name>';
$user = '<user>';
$pass = '<password>';
$charset = 'utf8mb4';
$dsn = "mysql:host=$host;dbname=$db;charset=$charset";
$options = [
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
PDO::ATTR_EMULATE_PREPARES => false,
PDO::ATTR_TIMEOUT => 5
];
try {
//executing first statement, no problem
echo 'CREATING PDO OBJECT'.PHP_EOL;
$pdo = new PDO($dsn, $user, $pass, $options);
echo 'PDO OBJECT CREATED, PREPARING STATEMENT'.PHP_EOL;
$stmt = $pdo->prepare('SELECT *
FROM someObject
WHERE objectID = 1');
echo 'STATEMENT PREPARED, EXECUTING STATEMENT'.PHP_EOL;
$stmt->execute();
echo 'STATEMENT EXECUTED, FETCHING RESULT: ';
$result = $stmt->fetch();
echo count($result).PHP_EOL;
$sleep = 258;
echo 'SLEEP: '.$sleep.PHP_EOL;
sleep($sleep);
echo 'WOKE UP'.PHP_EOL;
//executing second statement after sleep, hangs
echo 'PREPARING STATEMENT'.PHP_EOL;
$stmt = $pdo->prepare('SELECT *
FROM someObject
WHERE objectID = 2'); //hangs here
echo 'STATEMENT PREPARED, EXECUTING STATEMENT'.PHP_EOL;
$stmt->execute();
echo 'STATEMENT EXECUTED, FETCHING RESULT: ';
$result = $stmt->fetch();
echo count($result).PHP_EOL;
} catch (\PDOException $e) {
throw new \PDOException($e->getMessage(), (int)$e->getCode());
}
This script output is:
CREATING PDO OBJECT
PDO OBJECT CREATED, PREPARING STATEMENT
STATEMENT PREPARED, EXECUTING STATEMENT
STATEMENT EXECUTED, FETCHING RESULT: 20
SLEEP: 258
WOKE UP
PREPARING STATEMENT
After this, the process becomes stale and does nothing, PREPARING STATEMENT is the last sentence I see in the output until max execution time is reached - then the process is dropped by PHP. It happens always if logic reaches 258 seconds of database inactivity.
If I reduce sleep time to 257 seconds - it always works, I see a second result, and the script finishes successfully. For me, it looks like there is some parameter that blocks connection after 257 seconds of inactivity. On the MySQL side, I see following
SHOW FULL PROCESSLIST;
+------+-----------------+---------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+-----------------+---------------------+------------------+
| <id> | <user> | <host> | <db> | Sleep | 1258 | | NULL |
+------+-----------------+---------------------+------------------+
As you may see - Time is 1258 here, it never gets closed (only after it reaches MySQL wait_timeout). MySQL timeouts below
+-----------------------------------+----------+
| Variable_name | Value |
+-----------------------------------+----------+
| connect_timeout | 10 |
| delayed_insert_timeout | 300 |
| have_statement_timeout | YES |
| innodb_flush_log_at_timeout | 1 |
| innodb_lock_wait_timeout | 50 |
| innodb_rollback_on_timeout | OFF |
| interactive_timeout | 28800 |
| lock_wait_timeout | 31536000 |
| mysqlx_connect_timeout | 30 |
| mysqlx_idle_worker_thread_timeout | 60 |
| mysqlx_interactive_timeout | 28800 |
| mysqlx_port_open_timeout | 0 |
| mysqlx_read_timeout | 30 |
| mysqlx_wait_timeout | 28800 |
| mysqlx_write_timeout | 60 |
| net_read_timeout | 30 |
| net_write_timeout | 60 |
| replica_net_timeout | 60 |
| rpl_stop_replica_timeout | 31536000 |
| rpl_stop_slave_timeout | 31536000 |
| slave_net_timeout | 60 |
| ssl_session_cache_timeout | 300 |
| wait_timeout | 28800 |
+-----------------------------------+----------+
23 rows in set (0.00 sec)
You may say that the DB connection shouldn't be kept open if not needed and I agree with that, it is what I did to fix this problem. However, on the previous server, I noticed nothing like that and wondering what's happening here. Please, don't suggest using the persistent connection, I want to find the reason, not fix consequences.
Upvotes: 2
Views: 210
Reputation: 586
This ended up being on the Azure side. Since both servers were hosted there, I found that Azure Firewall considers TCP connection dead and drops it after 4 minutes of inactivity as stated here https://learn.microsoft.com/en-us/azure/firewall/firewall-faq#what-is-the-tcp-idle-timeout-for-azure-firewall
It is about any connection, not only about SQL one. Maybe this would be useful to someone
Could be fixed by changing TCP kernel parameters default value: net.ipv4.tcp_keepalive_time to any value lower than 4 minutes
Upvotes: 1