syvex
syvex

Reputation: 7756

How can I manually remove a blob object from a tree in Git?

Say that I have something similar to this when I run git ls-tree -r master:

100644 blob a450cb6b6371494ab4b3da450f6e7d543bfe3493    FooBar/readme.txt
100644 blob a339338d7ad5113740244e7f7d3cbb236cb47115    Foobar/readme.txt

How can I remove the second blob from this tree object?

I'm assuming that this can be done on POSIX systems by just doing a git rm Foobar/readme.txt. How would I do the same thing on Windows?

Upvotes: 3

Views: 2756

Answers (2)

torek
torek

Reputation: 489083

OK, so, I spent a little time and effort testing this out on MacOS, which has similar problems with case folding.

I don't know if all versions of git are "the same enough" and/or whether Windows git works the same, but this script actually does the trick, without having to get any deeper in git plumbing than ls-tree -r and cat-file and rm --cached.

The script is also only lightly tested. (Note: tabs are getting smashed, cmd-C/cmd-V pasted the tabs in but I had to indent for stackoverflow. So the file indentation is goofed up below ... too lazy to fix here.)

#! /bin/bash

usage()
{
cat << EOF
usage: $0 [-h] [-r] [branch]

-h: print usage help
-r: rename ALL colliding files to their hashes
EOF
}

DO_RENAME=false
while getopts "hr" opt; do
case $opt in
h) usage; exit 0;;
r) DO_RENAME=true;;
*) usage 1>&2; exit 1;;
esac
done
shift $(($OPTIND - 1))

case $# in
0) branch=HEAD;;
1) branch=$1;;
*) usage
esac

# literal tab, so that it's easily distinguished from spaces
TAB=$(printf \\t)

branch=$(git rev-parse --symbolic $branch) || exit

tempfile=$(mktemp -t git-casecoll)
trap "rm -f $tempfile; exit 0" 0
trap "rm -f $tempfile; exit 1" 1 2 3 15

# First, let's find out whether there *are* any file name
# case collisions in the tree.
git ls-tree -r $branch > $tempfile
nfiles=$(wc -l < $tempfile | sed 's/  *//g')
n2=$(sort "-t$TAB" -k2 -f -u $tempfile | wc -l | sed 's/  *//g')
if [ $nfiles -eq $n2 ]; then
echo no collisions found
exit 0
fi
echo "$(($nfiles - $n2)) collision(s) found"

# functions needed below

# decode git escapes in pathnames
decode_git_pathname()
{
local path="$1"
case "$path" in
\"*\")
    # strip off leading and trailing double quotes
    path=${path#\"}
    path=${path%\"}
    # change % into %%
    path=${path/\%/%%}
    # and then interpret backslashes with printf
    printf -- "$path";;
*)
    # not encoded, just print it as is
    printf %s "$path";;
esac
}

show_or_do_rename()
{
local mode=$1 path="$(decode_git_pathname "$2")" sha1=$3
local renamed_to="$(dirname "$path")/$sha1"
local ftype=${mode:0:2}

if [ $ftype != 10 ]; then
    echo "WARNING: I don't handle $ftype files ($mode $path) yet"
    return 1
fi
if $DO_RENAME; then
    # git mv does not work, but git rm --cached does
    git rm --cached --quiet "$path"
    rm -f "$path"
    git cat-file -p $sha1 > "$renamed_to"
    chmod ${mode:2} "$renamed_to"
    git add "$renamed_to"
    echo "renamed: $path => $renamed_to"
else
    if [ $ftype != 10 ]; then
    echo "# I don't handle extracting a $ftype file ($mode) yet"
    else
    echo will: mv "$path" "$renamed_to"
    fi
fi
}

# Now we have to find which ones they were, which is more difficult.
# We still want the sorted ones with case folded, but we don't want
# to remove repeats, instead we want to detect them as we go.
#
# Note that Dir/file collides with both dir/file and dir/File,
# so if we're doing rename ops, we'll rename all three.  We also
# don't know if we're in a collision-group until we hit the second
# entry, so the first time we start doing a collision-group, we
# must rename two files, and from then on (in the same group) we
# only rename one.
prevpath=""
prevlow=""
prevsha=
in_coll=false
sort -f $tempfile |
while IFS="$TAB" read -r info git_path; do
    set -- $info
    mode=$1
    # otype=$2  -- we don't care about the object type?
    # it should always be "blob"
    sha1=$3
    lowered="$(printf %s "$git_path" | tr '[:upper:]' '[:lower:]')"
    if [ "$prevlow" = "$lowered" ]; then
    if $in_coll; then
        echo "      and: $prevpath vs $git_path"
        show_or_do_rename $mode "$git_path" $sha1
    else
        echo "collision: $prevpath vs $git_path"
        show_or_do_rename $mode "$prevpath" $prevsha
        show_or_do_rename $mode "$git_path" $sha1
        in_coll=true
    fi
    else
    prevlow="$lowered"
    prevpath="$git_path"
    prevsha=$sha1
    in_coll=false
    fi
done

Here's a sample run. I made a "bad for windows" repo on a Linux box, then cloned it over to a Mac.

$ git clone ...
Initialized empty Git repository in /private/tmp/caseissues/.git/
remote: Counting objects: 16, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 16 (delta 1), reused 0 (delta 0)
Receiving objects: 100% (16/16), done.
Resolving deltas: 100% (1/1), done.
$ cd caseissues
$ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   FooBar/readme.txt
#
no changes added to commit (use "git add" and/or "git commit -a")
$ git-casecoll.sh 
1 collision(s) found
collision: FooBar/readme.txt vs Foobar/readme.txt
will: mv FooBar/readme.txt FooBar/31892d33f4a57bff0acd064be4bb5a01143dc519
will: mv Foobar/readme.txt Foobar/591415e1e03bd429318f4d119b33cb76dc334772
$ git-casecoll.sh -r
1 collision(s) found
collision: FooBar/readme.txt vs Foobar/readme.txt
renamed: FooBar/readme.txt => FooBar/31892d33f4a57bff0acd064be4bb5a01143dc519
renamed: Foobar/readme.txt => Foobar/591415e1e03bd429318f4d119b33cb76dc334772
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   renamed:    FooBar/readme.txt -> FooBar/31892d33f4a57bff0acd064be4bb5a01143dc519
#   renamed:    Foobar/readme.txt -> Foobar/591415e1e03bd429318f4d119b33cb76dc334772
#

(at this point I pick out my own names for fixing these—note, I let it autocomplete, and had to try again, manually lower-case-ing the b in FooBar, because of case weirdness)

$ git mv FooBar/31892d33f4a57bff0acd064be4bb5a01143dc519 FooBar/readme_A.txt
$ git mv FooBar/591415e1e03bd429318f4d119b33cb76dc334772 FooBar/readme_B.txt
fatal: not under version control, source=FooBar/591415e1e03bd429318f4d119b33cb76dc334772, destination=FooBar/readme_B.txt
$ git mv Foobar/591415e1e03bd429318f4d119b33cb76dc334772 FooBar/readme_B.txt
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   renamed:    FooBar/readme.txt -> FooBar/readme_A.txt
#   renamed:    Foobar/readme.txt -> FooBar/readme_B.txt
#
$ git commit -m 'fix file name case issue'
[master 4ef3a55] fix file name case issue
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename FooBar/{readme.txt => readme_A.txt} (100%)
 rename Foobar/readme.txt => FooBar/readme_B.txt (100%)

Upvotes: 1

ralphtheninja
ralphtheninja

Reputation: 133118

git filter-branch with --index-filter might work since you are operating on the index and not on the working tree. Try something like:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch Foobar/readme.txt' HEAD

Upvotes: 1

Related Questions