Alex Bollbach
Alex Bollbach

Reputation: 4590

What is a reliable way to detect whether a file or directory has changed in the most recent commit?

I have a web project which I deploy to an ec2 instance simply by pushing new commits. I use the post-recieve git hook remotely to execute a shell-script which 'deploys' the project by checking it out into a production directory. The steps are, run npm install on the express app, npm install on the frontend (a create-react-app app), then run npm run build (which basically utilizes web-pack to build an optimized distribution folder from my node source code).

These steps are expensive and in many cases not needed. E.G. if all I did was update a Node component in srcs/components/ then npm run build should run, but npm install on the server and frontend shouldn't. If all I have done is added a comment to my express app, no scripts should run.

My currently server-side deploy script looks like this:

#!/usr/bin/env bash

GIT_WORK_TREE=/home/ec2-user/absiteProd git checkout -f

### TODO: conditional NPM work

pm2 restart index

My question is then how can I use git (or grep, sed, awk, etc..) to reliably tell me when either /home/ec2-user/absiteProd/frontend/package.json, /home/ec2-user/absiteProd/server/package.json or anything in 'home/ec2-user/absiteProd/frontend/sources` has changed?

Currently I'm having some success with:

if `git log --stat -n 1` | grep --quite frontend/src/* ; then
   cd home/ec2-user/frontend
   npm run build
fi

But since this seems like such a common requirement in app deployment, I feel like there must be a simpler way?

Upvotes: 1

Views: 1354

Answers (2)

VonC
VonC

Reputation: 1328572

You can find a similar need in this thread:

How do I find a last commit for the given directory inside the repository?

I want to avoid rebuilding the specific part of the project if there were no changes in it since the last build, so I need to find the sha of the last time the directory was changed.

You can compare the last commit where an element is modified, using git rev-list:

git rev-list -1 HEAD -- frontend/package.json
git rev-list -1 HEAD -- absiteProd/server/package.json
git rev-list -1 HEAD -- frontend/src

with the current HEAD SHA1 (git rev-parse, the --verify is optional):

git rev-parse --verify HEAD

That is:

h=$(git rev-parse --verify HEAD)
b=false
if [[ "$(git rev-list -1 HEAD -- frontend/package.json)" == "${h}" ]]; then b=true; fi
if [[ "$(git rev-list -1 HEAD -- frontend/package.json)" == "${h}" ]]; then b=true; fi
if [[ "$(git rev-list -1 HEAD -- frontend/package.json)" == "${h}" ]]; then b=true; fi
if !b; then exit 0; fi
cd home/ec2-user/frontend
npm run build

Upvotes: 1

torek
torek

Reputation: 489698

Git does not store directories in any useful way, so you must define what you mean by "anything in" yourself (which has its advantages since you can define what you mean rather than getting stuck with someone else's useless-to-you definition, but means you must do more work).

That said, Git stores each file as a path name within each commit. Your deployment script takes some work-tree—in this case, /home/ec2-user/absiteProd—from one state to another. Since it uses git checkout to do so, and git checkout does nothing special with time stamps, you now have many options with many different low-level details and subsequent consequences. Here are two obvious-ish and reasonably simple starting points:

  • Was /home/ec2-user/absiteProd exactly the same as some previous commit? If so, which commit? (Commits have unique hash IDs and these are generally the things to use in scripts.) You can then have Git compare the previous commit with the new commit, using git diff --name-status for instance. This is similar to what you are doing now, but better.

    If your deployment script is a post-receive script, you already have both the old and new hash IDs of the reference, which you have read from standard input. Hence the set of files changed, with their statuses, between those two commits, is:

     git diff-tree -r --name-status $oldhash $newhash
    
  • If git checkout wrote on any file(s), those files will have "now" as their modify-time time-stamps, since git checkout just lets the system's time apply to updated files. Can you use this? As long as you never deploy more than twice in a single second, you could combine this with the make build-system, which builds files based on time-stamps.

If make is suitable here, it is probably the best choice, except for its maximum of one-per-second deployment (or whatever your underlying OS has for time stamp resolution on files). You can just declare that whatever the output file(s) is/are, they depend on the corresponding input file(s), and give the recipe to build the output(s) from the input(s) and run make.

Upvotes: 1

Related Questions