user10091301
user10091301

Reputation:

How to handle large data in node js

I have a mysql database which contains data from api ,suppose it has a field called gameid which is unique ,every time a new data comes from API ,I run a query and select all gameid ,then with Array.Filter() method in JS(Node js) filter out the data which is not present in database and is there in API and store the unique data in database

Something Like

let filtered_data=datafromapi.filter(data=>!mysqldata.includes(data.gameid))

But With almost 30k records this takes a lot of time .Any idea how to do such process with mysql and node js.

Upvotes: 0

Views: 5619

Answers (2)

Shubham Dixit
Shubham Dixit

Reputation: 1

As per the comment of Jonas ,the upsert is the most suitable process which can allow you to avoid inserting duplicate data or probably the error encountered for duplicate records. As an example-

You can have a query like ,

insert into table_name values('your column values') on duplicate key update gameid = gameid

It will be much easier and hassle free of storing 30k records in memory and then filtering them. Moreover it will limit your concerns to database only.Link For Reference-dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html

Upvotes: 0

Sébastien Renauld
Sébastien Renauld

Reputation: 19672

Let's go over a few of the requirements and observable rules of your code from that one line of code:

  • You're receiving something that eventually maps into an array of structures as follows:

    {
      "data": ["game_ids"]
    }
    
  • You want to filter based on contents of another array, mysqldata.

Assuming you cannot change the return format of the API, you can leverage a property of your data to optimize at least sone of it.

Your API return is a list of independent objects. You can take this to your advantage, as you will only need to perform one operation on each to filter. If you can get your API call to return a Reader as opposed to a readily-parsed JSON object, you can take advantage of this by using a streaming json parser as opposed to JSON.parse.

This type of parser will return a stream of tokens, as it filters through your array, as opposed to doing the filtering then returning everything in one chunk.

This will not massively increase the performance, since the largest part of your code's clock time will be spent waiting for the network request to complete and the filtering itself (30k calls to Array.includes(), all of them impossible to avoid), so don't expect miracles.

The better way

The better way of doing it is to change the API endpoint, if at all possible, since that'll allow you to solve two problems in one go: fewer data transferred over the network, and fewer clock cycles spent filtering.

What you are effectively doing is a client-side (to the database, that is) WHERE gameid IN (...). If you are allowed to modify the API call, you should be taking this into account.

Upvotes: 0

Related Questions