user1478894
user1478894

Reputation: 11

Detecting web page updates with python

What would be the simplest way to check a web page for changes? I want to scan a web page every so often, and compare it to an older scan. One problem is I also need the scan to ignore certain changes, such as the time of day, etc. I only want to check for relevant updates.

Upvotes: 1

Views: 2572

Answers (1)

Sean Johnson
Sean Johnson

Reputation: 5607

I won't write code, but I'll give you the process I'd go through for solving this problem:

  1. Retrieve the source of the page
  2. Replace out all of the parts of the page that we don't care to monitor
  3. Calculate an md5 or sha1 hash of the source after replacements are made
  4. Compare the hash with the stored hash, see if it's different, and do whatever you need to do if the page has been updated
  5. Store the new hash

Upvotes: 4

Related Questions