Script to download a web page

Question

i made a web server to show my page locally, because is located in a place with a poor connection so what i want to do is download the page content and replace the old one, so i made this script running in background but i am not very sure if this will work 24/7 (the 2m is just to test it, but i want it to wait 6-12 hrs), so, ¿what do you think about this script? is insecure? or is enough for what i am doing? Thanks.

#!/bin/bash
a=1;
while [ $a -eq 1 ]
do
echo "Starting..."
sudo wget http://www.example.com/web.zip  --output-document=/var/www/content.zip
sudo unzip -o /var/www/content.zip -d /var/www/
sleep 2m
done
exit

UPDATE: This code i use now: (Is just a prototype but i pretend not using sudo)

#!/bin/bash
a=1;
echo "Start"
while [ $a -eq 1 ]
do
echo "Searching flag.txt"
if [ -e flag.txt ]; then
    echo "Flag found, and erasing it"
    sudo rm flag.txt

    if [ -e /var/www/content.zip ]; then
    echo "Erasing old content file"
        sudo rm /var/www/content.zip
    fi
    echo "Downloading new content"
    sudo wget ftp://user:password@xx.xx.xx.xx/content/newcontent.zip  --output-document=/var/www/content.zip
    sudo unzip -o /var/www/content.zip -d /var/www/
    echo "Erasing flag.txt from ftp"
    sudo ftp -nv < erase.txt
    sleep 5s
else
    echo "Downloading flag.txt"
    sudo wget ftp://user:password@xx.xx.xx.xx/content/flag.txt
    sleep 5s
fi
echo "Waiting..."
sleep 20s

done
exit 0

erase.txt

open xx.xx.xx.xx
user user password
cd content
delete flag.txt
bye

ghoti · Accepted Answer

Simply unzipping the new version of your content overtop the old may not be the best solution. What if you remove a file from your site? The local copy will still have it. Also, with a zip-based solution, you're copying EVERY file each time you make a copy, not just the files that have changed.

I recommend you use rsync instead, to synchronize your site content.

If you set your local documentroot to something like /var/www/mysite/, an alternative script might then look something like this:

#!/usr/bin/env bash

logtag="`basename $0`[$$]"

logger -t "$logtag" "start"

# Build an array of options for rsync
#
declare -a ropts
ropts=("-a")
ropts+=(--no-perms --no-owner --no-group)
ropts+=(--omit-dir-times)
ropts+=("--exclude ._*")
ropts+=("--exclude .DS_Store")

# Determine previous version
#
if [ -L /var/www/mysite ]; then
    linkdest="$(stat -c"%N" /var/www/mysite)"
    linkdest="${linkdest##*\`}"
    ropts+=("--link-dest '${linkdest%'}'")
fi

now="$(date '+%Y%m%d-%H:%M:%S')"

# Only refresh our copy if flag.txt exists
#
statuscode=$(curl --silent --output /dev/stderr --write-out "%{http_code}" http://www.example.com/flag.txt")
if [ ! "$statuscode" = 200 ]; then
    logger -t "$logtag" "no update required"
    exit 0
fi

if ! rsync "${ropts[@]}" user@remoteserver:/var/www/mysite/ /var/www/"$now"; then
    logger -t "$logtag" "rsync failed ($now)"
    exit 1
fi

# Everything is fine, so update the symbolic link and remove the flag.
#
ln -sfn /var/www/mysite "$now"
ssh user@remoteserver rm -f /var/www/flag.txt

logger -t "$logtag" "done"

This script uses a few external tools that you may need to install if they're not already on your system:

rsync, which you've already read about,
curl, which could be replaced with wget .. but I prefer curl
logger, which is probably installed in your system along with syslog or rsyslog, or may be part of the "unix-util" package depending on your Linux distro.

rsync provides a lot of useful functionality. In particular:

it tries to copy only what has changed, so that you don't waste bandwidth on files that are the same,
the --link-dest option lets you refer to previous directories to create "links" to files that have not changed, so that you can have multiple copies of your directory with only single copies of unchanged files.

In order to make this go, both the rsync part and the ssh part, you will need to set up SSH keys that allow you to connect without requiring a password. That's not hard, but if you don't know about it already, it's the topic of a different question .. or a simple search with your favourite search engine.

You can run this from a crontab every 5 minutes:

*/5 * * * * /path/to/thisscript

If you want to run it more frequently, note that the "traffic" you will be using for every check that does not involve an update is an HTTP GET of the flag.txt file.

Script to download a web page

Answers (2)

Related Questions