goodspeed
goodspeed

Reputation: 376

Why is a Mysql query execution in Node JS so much slower than a direct Mysql query execution?

Query: select id, event_time from events where event_time > 1395797406712 and event_time < 1398389406712 order by event_time asc.

this query returns ~25k rows (500KB in total size).

When I query the above query in Node.js using the node-mysql driver, it takes ~3-4 seconds to execute. (I used console.time and console.timeEnd)

When I query it directly in mySql, it says it takes ~200 ms.

What accounts for this immense difference, and how do I improve the Node.js implementation to be inline with the direct Mysql query?

Upvotes: 6

Views: 4033

Answers (3)

Shanon Jackson
Shanon Jackson

Reputation: 6531

If you wrap any long query in this

SELECT COUNT(1) FROM (
   <your query here>
) xyz

You'll see that the execution time will match any database tool (dbbeaver, etc) and the mysql cli.

The complicated part:

Where the time differential is coming from is the time it takes to convert the bytes mysql returns along with a table definition to JSON. The more rows returned the more work has to be done. When we SELECT COUNT(1) the amount of data being sent/parsed/serialized is like a couple of bytes and so what you're left with is how long the query actually takes in Node.

The differential between running directly and COUNT(1) is the parse/serialize time.

Database tools normally cheat here, if your query returns 150_000 rows they'll normally only parse/serialize the first 200 rows returned and then as you scroll parse/serialize the next 200 etc. This process will not run the query again as the bytes for the next 200 are just stored in memory buffer.

It's clear after doing some testing using mysql2, prisma, and other tools that implement the MySQL protocol that that is a clear bottleneck in node in serialization (interop) compared to other languages. It's most likely this is due to poorly written .js that takes the bytes returned and serializes into JSON.

This can be confirmed by if you force MySQL to return JSONAGGR of the data and then serialize that into data in node using any of the common drivers you'll see about as 500% performance increase (at the cost of the database doing more work).

This is because the serialization process on the NodeJS side is alot simpler the algorthimn will be:

Bytes are a utf-8 json string -> convert to utf-8 string -> JSON.parse(string) [native]

This is much faster than

Bytes returned are numbers/strings/floats/etc iterate through bytes and create json object literals manually (so basically a version of bytes -> json.parse) but the json.parse part isn't native code.

As a community I think there's a serious challenge here for the NodeJS ecosystem that we need to demand faster MySQL parsing/serialization other backend languages are literally eclipsing us in performance i'm seeing differences as high as 25seconds for 150_000 rows.

Debating the use-cases for returning this large amount of data is irrelevant if Java and Rust can make it fast so can we.

Upvotes: 0

goodspeed
goodspeed

Reputation: 376

Turns out it was an Amazon backend problem. We're using Elastic Beanstalk, and the EC2 and RDS instance are in the same region but not in the same availability zone. Once I set them to be the same, i.e., us-east-1a, the query in node.js took around ~200ms.

Upvotes: 2

adpalumbo
adpalumbo

Reputation: 3031

Are you using row streaming?

If not, node-mysql is building a 25k element object, with all the allocations and iterations and assignments associated with it. And it's doing it in javascript (rather than nicely optimized native code) since node-mysql is pure javascript.

Row streaming will let you collect the information you need for your calculations one record at a time, much like you would do with a the traditional sort of cursor-based MySQL adapter you see in most other environments.

Upvotes: 0

Related Questions