dbau
dbau

Reputation: 16349

PHP - Cron Jobs that sync data from an external API. How's my methodology?

I was after some feedback on a PHP/MySQL based web app that I'm in the process of developing. The app is a member-based site which uses a local database to store data for each user by day. This data comes from an external API and needs to be automatically synced daily so that my local DB has up-to-date data. This is the methodology I have in mind:

I have 2 Cron Jobs:

  1. The Queue Builder

  2. The Queue Worker

..and 3 database tables:

  1. User Data (stores whatever user data I have so far, if any).

  2. User Details (a list of all members which includes users that I don't have data for as yet, aka new signups).

  3. The Processing Queue

The Queue Builder is a PHP script that will run via Cron at regular intervals. It will:

This way The Processing Queue table will contain a list of all the URLs that need to be queried.

The Queue Worker is also a PHP Cron script that will:

This will also run regularly via a Cron job, so the idea is that data-syncing should be automated and users should have up-to-date data. My questions are:

  1. What are the general thoughts on my methodology? Are there any side effects to doing it this way? I'm a hobbyist developer without a CS background so always keen on gaining criticism and learning about best practices! =)

  2. When a new user signs up, I plan on giving them a "your data can take xx minutes to sync" while redirecting them to Getting Started resources etc. This is probably okay for my initial release, but further down the track I'd like to refine it so users get an email notification when syncing is ready or can see a % progress. Does my current solution accomodate this easily? Or will I have headaches down the track?

Opinions are appreciated! Many, MANY thanks in advance - I hope I have explained this clearly!

Upvotes: 1

Views: 3570

Answers (1)

Robin
Robin

Reputation: 4260

Probably the best advise I can give you is this: KISS!! No, I'm not being over-affectionate, this stands for "Keep it simple, stupid!" and is arguably a very important engineering principle. With this in mind, the first question I'd ask is "why cron?" Would it be possible to have all of these tasks run in real-time when users sign up? If yes, I'd say go with this for now and don't bother with cron. If you do decide to go with the cron module I'd recommend the following:

  • Consider using a lock file to prevent multiple instances of your script running at the same time. For example, if you run the script every 5 minutes, and each time it runs the script takes 10 minutes to complete then the multiple instances could interfere with each other.
  • Using curl multi will probably put more strain on your target server than making single requests in a loop, if you want to be polite to the target server then it's probably best to use single requests and have a short sleep in the loop.
  • If you only process 20 jobs at a time and your service is very popular you could end up with a permanently extending work queue. For example, if you're acquiring 40 tasks an hour and only processing 20 tasks an hour, you'll never reach the end of the queue and the queue will never complete.

HTH.

Upvotes: 1

Related Questions