Reputation: 3786
I'm scrapping pages using Beautiful Soup and I would like to save some html snippets offline and use them to compare with every time I scrape again to check if there as been any change to the page .
Aside from directly writing out an html file, what would be the best strategy for save a lot of html snippets offline ( which format ) for comparison use later on ?
Thank you
Upvotes: 0
Views: 87
Reputation: 44386
This is a classic use for a hash function
. Algorithms like md5
and sha256
boil any amount of text down to a few bytes. You can store just the hashes for any file you parse, and then when you get a new file, calculate the hash of that and compare the two hashes.
Upvotes: 2