Reputation: 16771
Among everything else I tried, I replaced my JMeter profile with a custom JavaScript that hit each of my API endpoints in turn in an infinite loop, and then ran this script in parallel in different browsers (one Firefox, one Chrome, one Safari) - to try and rule out issues related to all of my connections coming from the same source (same user agent, same cookies, same session ID, etc)
When I did this, I noticed that all of my issues went away. The queries ran in parallel and the app was a lot more responsive than JMeter would have you believe
It seems impossible to me that JMeter would be serializing the requests, since it's a de facto standard for load testing. So I started trying to reproduce the behavior
In an attempt to re-create the JMeter I created the following two PHP scripts which (hopefully) simulated my Yii application:
slow.php
<?php
session_start();
$_SESSION['some'] = 'value';
// Yii is calling session_write_close() almost immediately after
// the session is initialized, but to try and exacerbate issues,
// I've commented it out:
// session_write_close();
$dsn = "mysql:host=localhost;dbname=platypus;unix_socket=/tmp/mysql.sock";
$pdo = new PDO($dsn, "user", "password");
// Yii was using whatever the default persistence behavior was,
// but to try and exacerbate issues I set this flag:
$pdo->setAttribute(PDO::ATTR_PERSISTENT, true);
// Simulate a query running for 1 second by issuing a 1-second sleep
$pdo->query("DO SLEEP(1)");
echo "Done";
fast.php
<?php
session_start();
$_SESSION['some'] = 'value';
$dsn = "mysql:host=localhost;dbname=platypus;unix_socket=/tmp/mysql.sock";
$pdo = new PDO($dsn, "user", "password");
$pdo->setAttribute(PDO::ATTR_PERSISTENT, true);
// Simulate a query running for 0.1 seconds
$pdo->query("DO SLEEP(0.1)");
echo "Done";
Running JMeter against these two new endpoints there was no serialization of requests. Everything ran in parallel. fast.php always returned in 100-150ms and slow.php always returned in 1000-1050ms even as I scaled up to 3, 4, and 5 threads. I was able to watch things collapse at 11 threads, but that's because I exceeded the number of worker threads in PHP
So to summarize:
Despite the issue being non-existent when profiling with others tools, lots of people responded and gave lots of helpful information:
session_write_close()
as early as possibleUPDATE
statement could potentially lock the tables)I inherited a web application and I'm trying to make sense of its performance profile so I can start optimizing it for speed.
One thing I noticed pretty early on is that requests to the server are being serialized. For instance, suppose I have three endpoints with response times like so:
/api/endpoint1 --> 50ms
/api/endpoint2 --> 1000ms
/api/endpoint3 --> 100ms
If I hit a single endpoint, I measure the appropriate response times. But when I set up a script to hit all 3 at once I will sometimes see something like the following:
endpoint1: 50ms
endpoint2: 1050ms
endpoint3: 1150ms
Clearly the call to endpoint3 was queued and waiting for the call to endpoint2 to finish before it got a response.
My first thought was that this should be trivially solved with multithreading, so I took a look at the server configuration. PHP-FPM's process manager was set to "dynamic" with "start_servers" of 1, "max_children" of 5, and "max_spare_servers" of 2. For testing purposes I swapped this to "static" so that 5 PHP processes would remain open for handling connections in parallel (more than the 3 for the number of endpoints I was hitting, so they should be able to process simultaneously)
This had no impact on performance, so I looked at my nginx config. "worker_processes" was set to 1 with "worker_connections" set to 1024. I know that nginx uses an event loop model, so it shouldn't be blocking while it waits for a response from PHP-FPM. But just in case, I bumped up "worker_processes" to 5
Still, no effect. So next I looked at the database. All 3 endpoints had to hit the database, and I know as a fact that the 1000ms response time is mostly spent waiting on a long-running database query. I tried setting "thread_pool-size" to 5 and also within the MySQL REPL I set "innodb_parallel_read_threads" and "mysqlx_min_worker_threads" to 5
Still, my requests were getting serialized. When I log into the MySQL REPL and type show processlist;
while my script is running (using a while-true loop to repeatedly hit those 3 API endpoints) I noticed that there was only ever one connection to the web application's user
Unfortunately I'm not sure if my issue lies with the database (not allowing more than one connection), with PHP-FPM (not processing more than one request at a time), or with nginx (not forwarding more than one request at a time to PHP-FPM). I'm also not sure how to figure out which one is acting as the bottleneck
Looking around some more I found this SO post which seems to suggest that MySQL doesn't support parallel queries from the same user (e.g. from the web application user)
Is this true? Surely such a ubiquitous database engine wouldn't have such a performance flaw, especially given how commonly it's used with AWS for massively scaled applications. I understand that for simple "read from disk" queries parallelizing them wouldn't improve performance since they'd just have to sit in a queue waiting on disk I/O, but modern databases have in-memory caches, and most of the really slow operations like filesort tend to happen in memory. There's no reason a disk-bound query couldn't run in parallel (make a request to disk and start waiting on I/O) while a cpu-bound query is busy sorting a table in RAM. The context switching may slightly slow down the cpu-bound queries, but if slowing those down from 1000ms to 1200ms means my 5ms query can run in 5 ms, I think that's worth it.
Here are the queries for my 3 endpoints. Note that the timings listed are the response time for the full HTTP pipeline (from browser request to response) so this includes overhead from nginx and PHP, plus any post-processing of the query done in PHP. That said, the query in endpoint 2 makes up 99% of the runtime, and locks the database so that endpoints 1 and 3 are queued up instead of returning quickly.
endpoint1 (50ms)
SELECT * FROM Widget WHERE id = 1 LIMIT 1
(Note that 50ms is the full response time for the endpoint, not how long the query takes. This query is clearly on the order of microseconds)
endpoint2 (1000ms)
USE platypus;
SELECT `t`.`(49 fields)` AS `t0_cX`,
`site`.`(29 fields)` AS `t2_cX`,
`customer`.`(26 fields)` AS `t4_cX`,
`domain`.`(20 fields)` AS `t6_c0`,
`domain-general_settings`.`(18 fields)` AS `t8_cX`,
`domain-access_settings`.`(17 fields)` AS `t9_cX`,
`customer-general_settings`.`(18 fields)` AS `t10_cX`,
`customer-access_settings`.`(17 fields)` AS `t11_cX`,
`site-general_settings`.`(18 fields)` AS `t12_cX`,
`site-access_settings`.`(17 fields)` AS `t13_cX`,
`backup_broadcast`.`(49 fields)` AS `t14_cX`,
`playlists`.`(11 fields)` AS `t16_cX`,
`section`.`(10 fields)` AS `t17_cX`,
`video`.`(16 fields)` AS `t18_cX`,
`general_settings`.`(18 fields)` AS `t19_cX`,
`access_settings`.`(17 fields)` AS `t20_cX`,
FROM `broadcast` `t`
LEFT OUTER JOIN `site` `site`
ON ( `t`.`site_id` = `site`.`id` )
LEFT OUTER JOIN `customer` `customer`
ON ( `site`.`customer_id` = `customer`.`id` )
LEFT OUTER JOIN `domain` `domain`
ON ( `customer`.`domain_id` = `domain`.`id` )
LEFT OUTER JOIN `generalsettings` `domain-general_settings`
ON ( `domain`.`general_settings_id` =
`domain-general_settings`.`id` )
LEFT OUTER JOIN `accesssettings` `domain-access_settings`
ON
( `domain`.`access_settings_id` = `domain-access_settings`.`id` )
LEFT OUTER JOIN `generalsettings` `customer-general_settings`
ON ( `customer`.`general_settings_id` =
`customer-general_settings`.`id` )
LEFT OUTER JOIN `accesssettings` `customer-access_settings`
ON ( `customer`.`access_settings_id` =
`customer-access_settings`.`id` )
LEFT OUTER JOIN `generalsettings` `site-general_settings`
ON ( `site`.`general_settings_id` =
`site-general_settings`.`id` )
LEFT OUTER JOIN `accesssettings` `site-access_settings`
ON ( `site`.`access_settings_id` =
`site-access_settings`.`id` )
LEFT OUTER JOIN `broadcast` `backup_broadcast`
ON ( `t`.`backup_broadcast_id` = `backup_broadcast`.`id` )
AND ( backup_broadcast.deletion IS NULL )
LEFT OUTER JOIN `playlist_broadcast` `playlists_playlists`
ON ( `t`.`id` = `playlists_playlists`.`broadcast_id` )
LEFT OUTER JOIN `playlist` `playlists`
ON
( `playlists`.`id` = `playlists_playlists`.`playlist_id` )
LEFT OUTER JOIN `section` `section`
ON ( `t`.`section_id` = `section`.`id` )
LEFT OUTER JOIN `video` `video`
ON ( `t`.`video_id` = `video`.`id` )
AND ( video.deletion IS NULL )
LEFT OUTER JOIN `generalsettings` `general_settings`
ON ( `t`.`general_settings_id` = `general_settings`.`id` )
LEFT OUTER JOIN `accesssettings` `access_settings`
ON ( `t`.`access_settings_id` = `access_settings`.`id` )
WHERE
(
(
t.id IN (
SELECT `broadcast`.id FROM broadcast
LEFT JOIN `mediashare` `shares`
ON ( `shares`.`media_id` = `broadcast`.`id` )
AND `shares`.media_type = 'Broadcast'
WHERE
(
(
broadcast.site_id IN(
'489', '488', '253', '1083', '407'
)
OR
shares.site_id IN(
'489', '488', '253', '1083', '407'
)
)
)
)
)
AND
(
(
(
(t.deletion IS NULL)
)
)
AND
(
IF(
t.backup_mode IS NULL,
t.status,
IF(
t.backup_mode = 'broadcast',
backup_broadcast.status,
IF(
t.backup_mode = 'embed',
IF(
t.backup_embed_status,
t.backup_embed_status,
IF(
'2020-01-08 16:34:52' < t.date,
1,
IF(
t.date > Date_sub(
'2020-01-08 16:34:52',
INTERVAL IF(t.expected_duration IS NULL, 10800, t.expected_duration) second
),
10,
12
)
)
),
t.status
)
)
) != 0
)
)
)
LIMIT 10;
This query takes roughly 1000ms to run, but the PHP for the endpoint is extremely simple (run the query, return the results as JSON) and only adds a couple milliseconds of overhead
endpoint 3 (100ms)
SELECT * FROM platypus.Broadcast
WHERE deletion IS NULL
AND site_id IN (SELECT id FROM platypus.Site
WHERE deletion IS NULL
AND customer_id = 7);
There's additional validation on the PHP side here which makes this endpoint take 100ms. The SQL, as you can see, is still fairly simple.
As there is a post length limit in StackOverflow, I cannot show the CREATE TABLE for every single table touched by endpoint 2, but I can show at least one table. Others use the same engine.
CREATE TABLE `Widget` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`widget_name` varchar(255) NOT NULL,
`widget_description` varchar(255) NOT NULL,
`status` varchar(255) NOT NULL,
`date_created` datetime NOT NULL,
`date_modified` datetime NOT NULL,
`auto_play` varchar(255) NOT NULL,
`on_load_show` varchar(255) NOT NULL,
`widget_content_source` varchar(255) NOT NULL,
`associated_sites` text NOT NULL,
`author_id` int NOT NULL,
`associated_sections` text,
`after_date` datetime DEFAULT NULL,
`before_date` datetime DEFAULT NULL,
`show_playlists` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`is_classic` tinyint(1) NOT NULL,
`default_site` int unsigned DEFAULT NULL,
`auth_code_url` varchar(255) DEFAULT NULL,
`widget_layout_id` int unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_Widget_widget_layout_id_WidgetLayout_id` (`widget_layout_id`),
CONSTRAINT `fk_Widget_widget_layout_id_WidgetLayout_id` FOREIGN KEY (`widget_layout_id`) REFERENCES `WidgetLayout` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1412 DEFAULT CHARSET=utf8
Notice that endpoint 2 doesn't even touch the Widget table, but endpoint 1 (which ONLY touches the Widget table) is also queued up. This eliminates the possibility of table locking.
When observing the process list in MySQL, only one connection is ever being made to the database from the application user. The issue may therefore lie in my PHP configuration.
Attached is the EXPLAIN SELECT ...
query for endpoint 2
To try and determine where the parallel pipeline was falling apart, I created two simple scripts:
sleep.php
<?php
sleep(5);
echo "Done sleeping";
return.php
<?php
echo "That's all";
Doing this (sleeping in PHP) and running my script to hit these two endpoints with 3 threads I saw no issues. return.php always came back in ~11 milliseconds, despite sleep.php taking 5066 on average. I then tried doing the sleeping in MySQL:
sleep.php
<?php
$pdo = new PDO("...", "user", "pass");
$pdo->query("DO SLEEP(5)");
echo "Done sleeping";
This, again, had no issues. The sleeping endpoint did not block the non-sleeping one.
This means that the issue does not exist at the nginx, PHP, or PDO level - but that there must be some kind of table or row locking going on. I'm going to re-enable the general query log and scan through every query being performed to see if I can figure out what's happening.
If you scroll up to "Recent Discovery" at the top of this post, you'll notice that I've modified my understanding of the problem.
I was not having an issue with parallelization, but with JMeter. I have failed to create a simple repro case, but I know now that the issue does not lie with my application but rather with how I'm profiling it.
Upvotes: 1
Views: 2274
Reputation: 676
MySQL can handle a lot of parallel queries, but you can't do more than one query at the time for each connection. The way PHP is usually setup is that each request goes to a different thread/process, so each process will have its own connection to MySQL, thus the problem mentioned is avoided. Unless you use persistent connection inside PHP and then you might end up using the same connection for each request. If that's the case it should be easy to disable it and go back to the standard one database connection per request model.
My first guess is that endpoint 2 triggers some locking on the database and that's why endpoint3 query is queued until enpoint2's query finishes. This can be fixed by changing the logic in the code (avoid or minimize the locking of the database), or by changing database configuration or table engines used to better suit application needs. Example: InnoDB uses row level locking while MyISAM locks the whole table lock.
Profiling will be really helpful if you don't mind configuring it. I suggest to have a look at Blackfire.io, New Relic or xdebug profiling if you go this route. You will be able to find the bottlenecks faster this way.
Upvotes: 2
Reputation: 463
I think you have an issue with php session locking : your second and third query are trying to access the same php session, and are waiting.
Try to call session_write_close
as soon as you can, to free your php session. As soon as you can : when you are sure you will not write any more data in your php session.
An easy way to check this is to try with 2 browsers or in anonymous/incognito mode : your cookies will not be shared, and you should have 2 sessions, not blocking each other.
Upvotes: 3
Reputation: 142218
MySQL + PHP + Apache has 'always' been very good at running separate SQL statements in 'parallel'. If separate users issue HTTP requests, they will naturally go through Apache quickly (probably in sequence, but fast) and get to separate instances of PHP (assuming Apache has configured enough 'children'). Each PHP script will make its own connection MySQL. MySQL will accept multiple connections quite rapidly (assuming max_connections
is high enough, which it is by default). Each MySQL connection will work independently (baring low-level database locks, mutexes, etc). Each will finish when it finishes, ditto for PHP, and Apache returning results to the user.
I assume (without knowing for sure) that nginx works similarly.
Note: I suggest that Apache (and nginx) does things serially. But I suspect it takes on the order of a millisecond to hand off an HTTP request to PHP, so this "serial" step won't explain the timings you found.
I conclude that one of these is not really happening:
ENGINE=MyISAM
uses table locking; this, alone, might explain the problem. (Please provide SHOW CREATE TABLE
.)It may be possible (after seeing the SQL) to speed up the SQL, thereby diminishing the overall problem of sluggishness.
Queries
Assuming id
is the PRIMARY KEY
of each table, then these other indexes may be beneficial at speeding up Query 2:
backup_broadcast: (deletion, id)
shares: (media_type, media_id, site_id)
broadcast: (site_id, id)
video: (deletion, id)
playlists_playlists: (playlist_id, broadcast_id)
playlist_broadcast
smells like a "many-to-many mapping" table. If so, I recommend following the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table . (Ditto for any similar tables.)
OR
and IN ( SELECT ... )
tend to be inefficient constructs. But it sounds like you don't have any control over the queries?
Is that a LIMIT
without an ORDER BY
?? Do you care which 10 rows you get?? It will not be predictable.
What happens with that huge number of columns? Seems like most of them will be the same every time you run the query, thereby making it mostly a waste of time??
For query 3, site
needs INDEX(deletion, customer_id)
(in either order). However reformulating it to use a JOIN
or EXISTS
would probably run faster.
Upvotes: 4
Reputation: 49373
HM... too long for a comment.
a little bit simplified every engine has one queue where it gathers querys to be computed, depending on hardware it uses 2 or 3 or even more threads to compute every query. More threads are running more time every query needs, because of locks, like it locks an entire Table, when it inserts a new row with autoincrement.(you will find with a search many examples for locks). Of course every query needs memory and other resources that they have to share with the rest of all computer software that is running on a server.
With clustes you pay the price with overhead to manage multiple sql servers.
So from sql server side, it is parallel, however you need the hardware to support many threads/many engines(which should only be uses very carefully)
Of course you can have many users in sql, but for convenience sake, you have usually one per APP or sometimes even one per server. But the same user can access the database simultaneously, but you can disable that of course.
Your php runs parallel, because webserver are build to run papallel requests and and there it doesn't matter if it runs php, Python(django) or javascript(nodejs) , apache, IIS, nginx and there are a lot more, every technology has there perks and of cause more module you add to en engine, so much slower it gets.
So everything is parallel to a certain degree and you can increase the power of such systems as you see in cloud providers or virtual servers and so on.
The limits you only notice when like the introduction of Pokemon go or new games where even the huge cloud providers crash. Or the disaster with ObamaCare where nothing was tested on that scale, whichever idi... was responsible,
Parallelizing such tasks is difficult, because in case of a webserver and sqlserver it has to a degree caches where they park requests that are often made, but usually every request needs its own data.
In reality everything is much more complicated, starting with cpus with 3 Pipelines , Multiple cores and shared memory(which caused Meltdown and their brothers), goes over tables or databases that reside only in memory for high performance or web server that run only in cache of cpus, which is much faster than memory or harddrives.....
Upvotes: 0