Nicola Leoni
Nicola Leoni

Reputation: 874

How to automate git history squash by date?

I've a git repository that I use as folder sync system: any time I change something in a file in the laptop, pc or mobile the changes are automatically committed. No branches, single user.

This leads to plenty of commits, like 50 per day. I would like to write a bash cron script to automate the history squashing, having a single commit per day, no matters about the comments but preserving the date.

I tried git-rebase -i SHA~count, but I can't figure out how to automate the process, i.e. pick the first commit and squashing the other count commits.

Any suggestions?

I've no problem about writing the bash that find the first SHA of the date and the counts the commits to merge, some loop over this would do the trick:

git log --reverse|grep -E -A3 ^commit| \
  grep -E -v 'Merge|Author:|--|^$'|paste - -| \
  perl -pe 's/commit (\w+)\s+Date:\s+\w+\s+(\w+)\s+(\d+).+/\2_\3 \1/'

Upvotes: 3

Views: 963

Answers (3)

Alderath
Alderath

Reputation: 3859

From my understanding, you intend to do something along the lines of this:

#!/bin/bash
FIRST_COMMIT_HASH_TODAY="$(git log --since="1 days ago" --pretty=format:%H | tail -n 1)"
git reset --soft ${FIRST_COMMIT_HASH_TODAY}^
git commit -m "Squashed changes for $(date +%F)"

Ie.

  1. List commit hashes for all commits that happend the during the last day, and extract the first of those commit hashes.
    (this assumes that there is at least one commit each day, in its current form above)
  2. Move the repo's HEAD pointer to the commit before $FIRST_COMMIT_HASH_OF_THE_DAY, but keep the work-tree and index unchanged.
  3. Commit the squashed changes.

A word of caution though... Note that now you're effectively rewriting history. You can no longer just do git pull to sync the changes because if a client repo still has the original commit history, while the server has the rewritten history, you will get something like:

Your branch and 'origin/master' have diverged,                                                                                                                                                                                                                                  
and have 50 and 1 different commit(s) each, respectively.

<EDIT>

If you want to process the entire history, one approach would be to use some variant of git filter-branch. I put one example approach below, but this approach has many weaknesses, so you might want to improve it a bit.

Weaknesses/characteristics:

  • Simply ignores the time zones from git raw time stamps. (weird behaviour if commits made in different time zones)
  • Identifies the latest commit on the branch you want to process by its root tree hash. (weird behaviour if multiple commits have same root tree (e.g. a revert commit reverting its parent commit))
  • Assumes a linear branch history. (weird behaviour if there are merge commits in the branch)
  • Doesn't specifically create one commit per day. Instead, for each commit, it checks if at least 24 hours have elapsed since the previous commit. If it hasn't it just skips that commit.
  • Always keeps the first and last commits, regardless of whether they are close in time to subsequent/previous commits.
  • Works based on GIT_COMMITER_DATEs rather than GIT_AUTHOR_DATEs.
  • Not well tested. So make sure to backup the original repo if you are going to try to run this.

Example command:

LATEST_TREE=$(git rev-parse HEAD^{tree}) git filter-branch --commit-filter '
  # $3 = parent commit hash (if commit has at least one parent)
  if [ -z "$3" ] 
  then
    # First commit. Keep it.
    git commit-tree "$@"
  elif [ "$1" == "$LATEST_TREE" ]
  then
    # Latest commit. Keep it.
    git commit-tree "$@"
  else
    PREVIOUS_COMMIT_COMMITTER_DATE="$(git log -1 --date=raw --pretty=format:%cd $3)"
    PREVIOUS_COMMIT_COMMITTER_DATE_NO_TIMEZONE="$(echo $PREVIOUS_COMMIT_COMMITTER_DATE | egrep -o "[0-9]{5,10}")"
    GIT_COMMITTER_DATE_NO_TIMEZONE="$(echo $GIT_COMMITTER_DATE | egrep -o "[0-9]{5,10}")"
    SECONDS_PER_DAY="86400"

    if [ $(expr $GIT_COMMITTER_DATE_NO_TIMEZONE - $PREVIOUS_COMMIT_COMMITTER_DATE_NO_TIMEZONE) -gt $SECONDS_PER_DAY ]
    then
      # 24 hours elapsed since previous commit. Keep this commit.
      git commit-tree "$@"
    else
      skip_commit "$@"
    fi
  fi' HEAD

If you had a command to extract the commit hashes of the commits you'd want to keep, maybe you could get the root tree hash for all those commits, and store them to a separate file. Then you could change the commit-filter condition to check "is the current root tree hash present in the file of desired root tree hashes?" instead of "has 24 hours elapsed since the previous commit?". (This would amplify the "identify commits by root tree hash" issue that I mentioned above though, as it would apply for all commits, rather than just the latest commit)

</EDIT>

Upvotes: 3

Nicola Leoni
Nicola Leoni

Reputation: 874

I share the resulsts based on Alderath suggstions: I've used git filter-branch to parse the history and keep just the last commit of the day. A first loop on git log will write the commit timestamps that needs to be preserved (the last in the day) in a temporary file; then with git filter-branch I keep only the commit with the timestamp present in the file.

#!/bin/bash

# extracts the timestamps of the commits to keep (the last of the day)
export TOKEEP=`mktemp`
DATE=
for time in `git log --date=raw --pretty=format:%cd|cut -d\  -f1` ; do
   CDATE=`date -d @$time +%Y%m%d`
   if [ "$DATE" != "$CDATE" ] ; then
       echo @$time >> $TOKEEP
       DATE=$CDATE
   fi
done

# scan the repository keeping only selected commits
git filter-branch -f --commit-filter '
    if grep -q ${GIT_COMMITTER_DATE% *} $TOKEEP ; then
        git commit-tree "$@"
    else
        skip_commit "$@"
    fi' HEAD
rm -f $TOKEEP

Upvotes: 5

innomatics
innomatics

Reputation: 369

If you have the number of commits you want to go back then you could just use git reset --soft and then make a new commit e.g.

COMMIT_COUNT=$(git log --pretty=oneline --since="1 days" | wc -l) 
git reset --soft HEAD~$COMMIT_COUNT
git commit -m "Today's work" 

Upvotes: 0

Related Questions