Reputation: 2010
I googled and couldn't find any could that would compare a webpage to a previous version.
In this case the page I'm trying to watch is link text. There are services that can watch a page, but I'd like to set this up on my own server.
I've set this up as a wiki so anyone can add to the code. Here's my idea
This script would be called nightly via cron or on-demand via the browser (the latter is not a priority)
Sounds simple, maybe I'm just not looking in the right place.
Upvotes: 1
Views: 1505
Reputation: 447
Perhaps a simple sh-script like this, featuring wget, diff & test?
#!/bin/sh
WWWURI="http://foo.bar/testfile.html"
LOCALCOPY="testfile.html"
TMPFILE="tmpfile"
WEBFILE="changed.html"
MAILADDRESS="$(whoami)"
SUBJECT_NEWFILE="$LOCALCOPY is new"
BODY_NEWFILE="first version of $LOCALCOPY loaded"
SUBJECT_CHANGEDFILE="$LOCALCOPY updated"
SUBJECT_NOTCHANGED="$LOCALCOPY not updated"
BODY_CHANGEDFILE="new version of $LOCALCOPY"
# test for old file
if [ -e "$LOCALCOPY" ]
then
mv "$LOCALCOPY" "$LOCALCOPY.bak"
wget "$WWWURI" -O"$LOCALCOPY" -o/dev/null
diff "$LOCALCOPY" "$LOCALCOPY.bak" > $TMPFILE
# test for update
if [ -s "$TMPFILE" ]
then
echo "$SUBJECT_CHANGEDFILE"
( echo "$BODY_CHANGEDFILE" ; cat "$TMPFILE" ) | tee "$WEBFILE" | mail -s "$SUBJECT_CHANGEDFILE" "$MAILADDRESS"
else
echo "$SUBJECT_NOTCHANGED"
fi
else
wget "$WWWURI" -O"$LOCALCOPY" -o/dev/null
echo "$BODY_NEWFILE"
echo "$BODY_NEWFILE" | tee "$WEBFILE" | mail -s "$SUBJECT_NEWFILE" "$MAILADDRESS"
fi
[ -e "$TMPFILE" ] && rm "$TMPFILE"
Update: Pipe through tee, little spelling & remove of $TMPFILE
Upvotes: 3
Reputation: 75205
You can check This SO posting to get a few ideas and also information about the challenge of detecting "true" changes to a web page (with fluctuating advertisement block, and other "noise")
Upvotes: 0