Crerem
Crerem

Reputation: 1359

What is the best approach when working with a REST API and big datasets

I have a data provider(REST Api) that stores info about 400-500k items that gets updated daily. The API methods I can call returns info for 1000 items only (but I have a pagination mechanism so i can loop through all data) .

I'm working with PHP/MySQL and my task is to check a website database(containing 10k to 100k items) against the data provided by this API. All I need to do is to check that the item ID from the website database is present in the provider database. If not, I will delete the record from the website database.

What will be the best method to do this daily ?

Should I first do a loop, get all the data from the data provider and store that into a file ? (considering it is 400-500k ids I don't think an array will do ) Then check each ID from the local database against that file ?

Upvotes: 0

Views: 153

Answers (1)

Joel Peltonen
Joel Peltonen

Reputation: 13432

I would refer to the "Rules Of Optimization Club" - specifically rules 1 and 2:

  1. You do not optimize.
  2. You do not optimize, without measuring first.

So build a solution that works with what you think of first. Then measure how it performs. If it performs badly, see what parts of it is slow (server responses / saving data / looping through data) and only then start to think about optimization.

This is specifically in response to "considering it is 400-500k ids I don't think an array will do" -- did you try and did it fail?

Upvotes: 1

Related Questions