Using Git to manage two versions of a website

Question

I'm sure this falls into the Using Git For What It Is Not Intended category, but I wanted to share what I am doing so that experts might comment on it.

I am using virtual hosting on my live server so that I can have 2 versions of my site running at once. One is example.com and the other is staging.example.com. The staging site is used for testing new features and I created a way to link two branches of the site's git repo (say, staging and master) to their respective site roots.

First of all, I set up git on the remote server so that I can checkout the latest master to the web root automatically when I push (using this great technique).

Then, in my post-receive hook I put this:

#!/bin/sh
GIT_WORK_TREE=/var/www git checkout master -f
GIT_WORK_TREE=/path/to/staging/site/webroot git checkout staging -f

With this method, I can keep two versions of the site going, using two branches in git), and when I push, the staging site gets updated with any new changes in the staging branch, and same with the master site.

I have found it to be a great way to manage demoing new features before making them public.

Should I not be doing this? Is there a better way? Other ideas or concerns?

Thanks, Jason

torek · Accepted Answer

You wanted to know how to be fancier. I'm going to go through the post-receive email hook script to show how it works (in part). This gets pretty long! :-)

OK, so, here's the crucial bits from post-receieve-email (reformatted a bit):

while read oldrev newrev refname; do
    prep_for_email $oldrev $newrev $refname || continue
    generate_email $maxlines | send_mail
done

This reads the old-SHA, new-SHA, and ref names from the input stream (I consider this one of git's failings: you can only read this stream once, then it's gone; this makes connecting up a bunch of otherwise unrelated hooks overly difficult) and invokes two shell functions to examine them and then either ignore, or do something with, each one.

Now here's the crucial bits from the prep_for_email shell function, with commentary:

prep_for_email()
{
    oldrev=$(git rev-parse $1)
    newrev=$(git rev-parse $2)
    refname="$3"

The rev-parse here (plus some code I dropped above that allows command-line arguments) allows you to feed the thing rev-names like "HEAD" and "HEAD^". An actual post-receive hook always gets the raw SHA1, so the rev-parse calls are no-ops.

    # --- Interpret
    # 0000->1234 (create)
    # 1234->2345 (update)
    # 2345->0000 (delete)
    if expr "$oldrev" : '0*$' >/dev/null
    then
        change_type="create"
    else
        if expr "$newrev" : '0*$' >/dev/null
        then
            change_type="delete"
        else
            change_type="update"
        fi
    fi

In a post-receive hook, if the "old" SHA1 is 0000000000000000000000000000000000000000 (all zeros, which is 40 0s; the expr above just checks for all-zeros), this means that the "ref" argument did not exist before, and now does. That's typically a branch create op, but can also be a tag create. On the other hand, if the "new" SHA1 is all zeros, the "ref" argument did exist before, and now doesn't: typically a branch delete op. Anything else, the ref-name used to resolve to the old-rev and now resolves to the new-rev. That's typically a branch update, but it can also be a tag move for instance.

Next, the email hook has some good "paranoia style" programming, cross-checking the ref-name against the actual underlying object type in the git repo:

    # --- Get the revision types
    newrev_type=$(git cat-file -t $newrev 2> /dev/null)
    oldrev_type=$(git cat-file -t "$oldrev" 2> /dev/null)
    case "$change_type" in
    create|update)
        rev="$newrev"
        rev_type="$newrev_type"
        ;;
    delete)
        rev="$oldrev"
        rev_type="$oldrev_type"
        ;;
    esac

If the ref-name has the form refs/tags/* and the update is to an annotated tag, this should set both $oldrev_type and $newrev_type to tag. (If it's a lightweight tag these will both be commit instead. If a lightweight tag turned into an annotated tag, the old type will be commit and the new type will be tag, and so on.) And of course, if you're deleting a branch or a tag, the "new" rev will be all 0s and $newrev_type will be the empty string because git cat-file -t will simply fail (that's why the 2> /dev/null).

(Aside: there's no good reason one git cat-file -t has its argument quoted, and one doesn't. Probably just someone got overly happy with quoting after running into the usual empty-string argument issues, but missed one. Fortunately it's harmless either way in this case. :-) )

The email script then has a very long case statement:

case "$refname","$rev_type" in
    ...
esac

which makes sure that an operation on refs/heads/* is always a commit. If not, it prints a message to stderr (which, on a git push, will be sent to whoever did the push, prefixed with remote:, so that someone can see it):

    refs/heads/*,commit)
        # branch
        refname_type="branch"
        short_refname=${refname##refs/heads/}
        ;;
    ...
    *)
        # Anything else (is there anything else?)
        echo >&2 "*** Unknown type of update to $refname ($rev_type)"
        echo >&2 "***  - no email generated"
        return 1
        ;;

Now for the useful bits from generate_email. Look at the actual script for details, but, in this case we only really care about the call to generate_email_header, plus the case that invokes generate_update_branch_email.

Here's the header:

generate_email_header()
{
    # --- Email (all stdout will be the email)
    # Generate header
    cat <<-EOF
    To: $recipients
    Subject: ${emailprefix}$projectdesc $refname_type $short_refname ${change_type}d. $describe
    X-Git-Refname: $refname
    X-Git-Reftype: $refname_type
    X-Git-Oldrev: $oldrev
    X-Git-Newrev: $newrev

    This is an automated email from the git hooks/post-receive script. It was
    generated because a ref change was pushed to the repository containing
    the project "$projectdesc".

    The $refname_type, $short_refname has been ${change_type}d
    EOF
}

$change_type, in the case we care about, is update so this says things like: branch zorg updated. ($describe is the output from git describe $newrev, or if that's empty—i.e., there were no annotated tags for git describe to use—just $newrev. $recipients, $emailprefix, and $projectdesc are from various configurables.)

In generate_update_branch_email, there's a whole lot of stuff to compute and print-for-emailing exactly which commits were removed and added; and then it ends with:

    echo "Summary of changes:"
    git diff-tree --stat --summary --find-copies-harder $oldrev..$newrev

Basically, a branch reference $refname of the form refs/heads/* (say, refs/heads/zorg), that has been updated (from an existing $oldrev that was not all 0s, to a new $newrev that is not all 0s), means that said (long-form) branch name, which people normally refer to as $short_refname (zorg), has moved that branch tip. Which is a whole lot simpler than is the code to figure out exactly what that move means!

In your case, you don't need to worry about branch creation; you can treat that like a regular update. You might (or might not) want to do something special on branch deletion. Most of the time you only care about branch updates ... and all you want to do is, if staging has been updated, update your special staging copy; if master has been updated, update your special master copy. In each case, the update will be "to the new branch tip", which is really easy to access.

So, if we put all this together, we get the following (untested, but pretty trivial) hook, to which I've added the ability to invoke it from the command line. To make it more useful, when invoked from the command line, you can specify the revision you want checked-out wherever it goes, e.g., you can say ./hookscript unused master~2 refs/heads/staging to shove rev master~2 into the staging copy area (assuming you wanted to do that for some reason).

#! /bin/bash
#
# handle updates to our two interesting branches, staging and master.

# function to dump given commit state to target directory
# arguments: $1 - rev; $2 - target dir
copy_to_dir() {
    GIT_WORK_TREE="$2" git checkout -q -f "$1"
}

# function to handle an update to staging branch.
# arguments: $1 - rev to check out
update_staging() {
    copy_to_dir $1 /path/to/staging/site/webroot
}

# function to handle an update to master branch.
update_master() {
    copy_to_dir $1 /var/www
}

# function to handle one reference-change.
# arguments:
#    $1 - old revision, or all-0s on create
#    $2 - new revision, or all-0s on removal
#    $3 - reference (refs/heads/*, refs/tags/*, etc)
refchange() {
    local oldrev="$1" newrev="$2" ref="$3"
    local deleted=false
    local short_revname

    if expr "$newrev" : '0*$' >/dev/null; then
        deleted=true
    elif ! git rev-parse "$newrev"; then
        return # git rev-parse already printed an error
    fi

    case $ref in
    refs/heads/staging|refs/heads/master)
        shortref=${refname#refs/heads/};;
    *)
        return;;
    esac

    # someone pushed a change to staging branch or master branch
    if $deleted; then {
        echo "WARNING: you've deleted branch $shortref"
        echo "are you sure you wanted to do that?"
        echo "The operating copy is still operating, and"
        echo "will be updated when the branch is re-created."
        } 1>&2
        return
    fi

    # update either the staging copy or the master copy
    update_$shortref "$newrev"
}

# main driver: update from input stream (if no arguments) or use arguments
case $# in
3)  refchange "$1" "$2" "$3";;
0)  while read oldrev newrev refname; do
        refchange $oldrev $newrev $refname
    done;;
*)  echo "ERROR: update hook called with $# arguments, expected 0 or 3" 1>&2;;
esac

Note, because I'm using git checkout -q -f on a parsed rev (rather than a name like staging or master), this will take the pushed-to branch to "detached HEAD" state each time. More importantly, if you use the command-line trick from a "normal" repo on the web server, it will unmoor your current branch. This is not fatal but could easily be annoying. To avoid that, replace the contents of copy_to_dir with, e.g., git archive $1 | (rm -rf "$2" && mkdir "$2" && cd "$2" && tar xf -).

(Normally you'd be pushing to a --bare repo, and changing its idea of "current branch" is not a problem, since nobody cares about that.)

Using Git to manage two versions of a website

Answers (2)

Related Questions