Reputation:
I have a mysql database which contains data from api ,suppose it has a field called gameid which is unique ,every time a new data comes from API ,I run a query and select all gameid ,then with Array.Filter() method in JS(Node js) filter out the data which is not present in database and is there in API and store the unique data in database
Something Like
let filtered_data=datafromapi.filter(data=>!mysqldata.includes(data.gameid))
But With almost 30k records this takes a lot of time .Any idea how to do such process with mysql and node js.
Upvotes: 0
Views: 5619
Reputation: 1
As per the comment of Jonas ,the upsert
is the most suitable process which can allow you to avoid inserting duplicate data
or probably the error encountered for duplicate records
. As an example-
You can have a query like ,
insert into table_name values('your column values') on duplicate key update gameid = gameid
It will be much easier and hassle free of storing 30k records in memory and then filtering them. Moreover it will limit your concerns to database only.Link For Reference-dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
Upvotes: 0
Reputation: 19672
Let's go over a few of the requirements and observable rules of your code from that one line of code:
You're receiving something that eventually maps into an array of structures as follows:
{
"data": ["game_ids"]
}
You want to filter based on contents of another array, mysqldata
.
Assuming you cannot change the return format of the API, you can leverage a property of your data to optimize at least sone of it.
Your API return is a list of independent objects. You can take this to your advantage, as you will only need to perform one operation on each to filter. If you can get your API call to return a Reader
as opposed to a readily-parsed JSON object, you can take advantage of this by using a streaming json parser as opposed to JSON.parse
.
This type of parser will return a stream of tokens, as it filters through your array, as opposed to doing the filtering then returning everything in one chunk.
This will not massively increase the performance, since the largest part of your code's clock time will be spent waiting for the network request to complete and the filtering itself (30k calls to Array.includes()
, all of them impossible to avoid), so don't expect miracles.
The better way of doing it is to change the API endpoint, if at all possible, since that'll allow you to solve two problems in one go: fewer data transferred over the network, and fewer clock cycles spent filtering.
What you are effectively doing is a client-side (to the database, that is) WHERE gameid IN (...)
. If you are allowed to modify the API call, you should be taking this into account.
Upvotes: 0