carl
carl

Reputation: 4426

how to list the files for the next git push command

I am trying to push a branch to origin with

git push --set-upstream origin v0.8

This seems to take forever and eventually stops with an error

Counting objects: 180, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (92/92), done.
Writing objects: 100% (180/180), 538.00 MiB | 72.00 KiB/s, done.
Total 180 (delta 142), reused 110 (delta 87)
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: eef60ca4521006cb11e4b7f181bc7a1a
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File X.sql is 1537.98 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/X/X.git
  ! [remote rejected] v0.8 -> v0.8 (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/X/X.git'

so apparently it tries to push a file with 1.5Gb named X.sql...? I can't see this file anywhere? So I had a look at the log with

git log v0.8 --not --remotes=origin

which gave

commit 046332334e1f944f64a110f92434cdc26e9fafd0
Author: X
Date:   Thu Jun 9 23:47:27 2016 +0100

search branch pushed to remote

commit 4b6d7c87a34bcd43f098d54263a032bb66baf9db  
Merge: 631d55a 539e3dc 
Author: X
Date:   Sun Jun 5 22:10:28 2016 +0100

Merge branch 'master' of https://github.com/X

commit 631d55a0998e99ebc7614bf4f58b85baa4e85403  
Author: X
Date:   Sun Jun 5 22:10:15 2016 +0100

once

commit 4aa7275f4381c222fff7ba9ae22ab00df886ba3b
Author: fbeutler X
Date:   Sun Jun 5 22:09:27 2016 +0100

once

how can I see all files connected to the commit? Just to check with one has the large file? From the answers below I saw that I probably committed a large file and deleted it. In this case git rebase would be the way to remove it. However, rebase does not work if there is no upstream branch? Here is the output for rebase

git rebase -i
There is no tracking information for the current branch.
Please specify which branch you want to rebase against.
See git-rebase(1) for details

    git rebase <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=origin/<branch> v0.8

If I follow this advise I get

git branch --set-upstream-to=origin/v0.8 v0.8
error: the requested upstream branch 'origin/v0.8' does not exist
hint: 
hint: If you are planning on basing your work on an upstream
hint: branch that already exists at the remote, you may need to
hint: run "git fetch" to retrieve it.
hint: 
hint: If you are planning to push out a new local branch that
hint: will track its remote counterpart, you may want to use
hint: "git push -u" to set the upstream config as you push.

my problem could be solved by deleting all current commits and just re-committing and pushing my current version... is that possible?

EDIT: Here is the output for

git log --graph --decorate --oneline --all

*   408ef30 (master) h
|\  
| * 7d4ecd3 (origin/master, origin/HEAD) new every
| * c63f869 every bug
| * a60a14a querydate bug fixed
| * 957a6d3 problem in every
| * 602891c problem in every
| * 9e827d2 problem in every
| | * 0463323 (HEAD -> v0.8, test) branch pushed to remote
| |/  
|/|   
* |   4b6d7c8 Merge branch 'master' of https://github.com/X/X
|\ \  
| |/  
| * 539e3dc pagedown removed, bibtex bug resolved
* | 631d55a once
* | 4aa7275 once
|/  

Upvotes: 3

Views: 2194

Answers (1)

torek
torek

Reputation: 488453

Edit: now that we have the git log --graph --decorate --oneline --all output it's easier to see what's going on and make suggestions.

The easy way: BFG repo cleaner

There's a Java based tool that basically does all the work below for you. I have never used it myself, but it has its own tag here on StackOverflow.

The slightly harder way: git filter-branch

Since you know the name of the file, you can use git filter-branch to do all of the steps below in an automated fashion. Simply use the index filter with git rm --cached --ignore-unmatch path/to/X.sql, and filter all of master, test, and v0.8 branches. But in general, you should only use filter-branch once you know what you are doing, and can follow all the steps below.

The hard way: a lot like what filter-branch will do, but manually

This has gotten quite long, and quoting here is going to make it longer still, but I think it is clearer this way, so here we go. :-)

The actual graph looks like this:

* 408ef30 (master) h
|\  
| * 7d4ecd3 (origin/master, origin/HEAD) new every
| * c63f869 every bug
| * a60a14a querydate bug fixed
| * 957a6d3 problem in every
| * 602891c problem in every
| * 9e827d2 problem in every
| | * 0463323 (HEAD -> v0.8, test) branch pushed to remote
| |/  
|/|   
* |   4b6d7c8 Merge branch 'master' of https://github.com/X/X
|\ \  
| |/  
| * 539e3dc pagedown removed, bibtex bug resolved
* | 631d55a once
* | 4aa7275 once
|/  

We can see from this that v0.8 is a branch (not a tag: useful to know but makes no difference in terms of pushing, just in terms of what we do when fixing things). That particular branch points to commit 0463323. There's an extra branch name, test, that also points to this same commit.

The parent of 0463323 is 4b6d7c8 Merge branch 'master' of .... Because 4b6d7c8 is a merge, it has two parents. Those two parent commits are 539e3dc pagedown removed, bibtex bug resolved and 631d55a once.

Commit 539e3dc is on origin/master and hence is already on GitHub. It cannot possibly have the large file in question, nor can any of its parent commits (which are also on GitHub). Commit 631d55a, however, is not on GitHub, nor is its parent commit 4aa7275. There's one row further down that is missing but we can see from the |/ line that commit 4aa7275 and commit 539e3dc must have their histories join up at whatever commit goes there.

We still cannot tell for sure where the big file has crept in, nor where it was subsequently removed, but we start with only four possibilities:

  • 0463323 branch pushed to remote (which, despite its name, is not actually pushed; it's failed-to-push)
  • 4b6d7c8 Merge branch 'master' of ...
  • 631d55a once
  • 4aa7275 once

(these four were also in the git log v0.8 --not --remotes=origin output).

The reasoning in the original answer is still valid. The big file cannot be in the topmost commit (the one pointed-to by branch name v0.8), because we would see the big file there. It cannot be in that commit's parent either, because if it were, we would see the file as being deleted when we look at the top-most commit, and we don't.

That leaves commits 631d55a once and 4aa7275 once as the remaining potential culprits. At least one of these commits, and possibly both, has the big file that we do not want.

We can fix this, but...

Starting from v0.8 and following the chain down (back through history) is what found us these four candidate commits. However, take a look at the top of the graph, where the commit labeled master (408ef30 h) resides. This commit is also a merge commit, with two parents. One parent is 7d4ecd3 new every, labeled origin/master. The other parent is 4b6d7c8 Merge branch 'master' of ....

This merge commit 4b6d7c8 connects to the pagedown removed commit that is fine and is on origin/master, but also to the most recent of the two once commits that we suspect are bad.

What this means is that in order to clear this all up, we need to:

  1. Find which of the two once commits is bad (it may be both of them); we no longer want these.
  2. Write at least one new commit that we do want, that has as its parent, the parent of 4aa7275 once: the commit not shown that is just off the bottom of the graph.

There are multiple ways to go about this, but here is the one I think is simplest. I'm assuming that there is something good in the two once commits, and that you do want a merge after these two commits, and that you do want to create a branch called v0.8 subsequent to the merge, and that you do want master to be a merge commit atop most of this new chain, including the intermediate merge commit, that merges origin/master back into the new chain.

If these assumptions are wrong, this is not what you want to do (nor are the filter-branch or BFG cleaner "easy" methods really what you want). But this is all beyond the scope of this answer.

In any case, before we take any steps, the work tree should be clean (git status should show nothing to commit, and we should not have modified files that can be staged for commit). If you have in-progress work, you will need to commit or stash it (this commit or stash can be added on to the "repair" branch later if desired). I'll assume, though, that the work tree is clean.

Making a new "repaired" branch

The first step is to get a new branch, in which we'll do the right things. This new branch should branch off from the parent commit of 4aa7275 once, which is also the parent commit of 539e3dc pagedown removed, bibtex bug resolved. If we had the actual ID of that particular commit we could use it here, but we don't. Instead, we can use the ^ or ~ suffix syntax from gitrevisions:

git checkout -b repairwork 539e3dc~1

This creates a new branch named repairwork pointing to the parent commit that is just off the bottom of our graph.

Next, we want to take the good parts of 4aa7275, without taking the bad parts:

git cherry-pick -n 4aa7275

The -n (which you can spell out as --no-commit) tells git cherry-pick to extract the changes from 4aa7275, but not to commit them yet. Now git status will show changes staged for commit.

Let's say, for simplicity, that the commit we just cherry-picked is the one that adds the large file that we don't want. All we have to do is remove it: git rm hugefile, for instance. Or, perhaps commit 631d55a once is the commit that removes it, and you'd like to squash whatever other changes are in it into this new commit. In that case, instead of git rm hugefile you can just do another git cherry-pick -n, this time for 631d55a.

Let's say, for simplicity again, that while 631d55a removes the big file, it contains some additional change that you'd like to keep separate, i.e., you want to still have two commits. In this case you should git rm the huge file, git commit the result, and then git cherry-pick 631d55a (without -n / --no-commit: since it does not add the huge file it's OK to just commit now).

Let's draw what we have so far:

* xxxxxxx (HEAD -> repairwork) once
* xxxxxxx once
|
| * 408ef30 (master) h
| |\  
| | * 7d4ecd3 (origin/master, origin/HEAD) new every
| | * c63f869 every bug
| | * a60a14a querydate bug fixed
| | * 957a6d3 problem in every
| | * 602891c problem in every
| | * 9e827d2 problem in every
| | | * 0463323 (v0.8, test) branch pushed to remote
| | |/  
| |/|   
| * |   4b6d7c8 Merge branch 'master' of https://github.com/X/X
| |\ \  
| | |/  
| | * 539e3dc pagedown removed, bibtex bug resolved
| * | 631d55a once
| * | 4aa7275 once
| |/  
|//
*  xxxxxxx some commit msg

Note that everything we do here adds new commits to the repository. Git is much like the Borg from Star Trek, in that every time you do anything, you simply add to its collective. What we are doing here is adding new commits that strongly resemble the originals, except that the huge file is no longer included.

Now that we have the two once commits—or, if it makes more sense, have squashed the two once commits down to a single once commit—that are (or is) similar but omit(s) the giant file, we can redo the Merge branch 'master' of ... step, i.e., copy commit 4b6d7c8.

Unfortunately, there is no way to copy a merge directly. The easiest thing is just to re-perform the merge. We're on some new commit on repairwork so we can just run git merge 539e3dc. This will merge our new once commit(s) with 539e3dc pagedown removed, bibtex bug resolved in the same way that we did it before, when we ran git merge to create 4b6d7c8. When the merge is done and we have the opportunity to edit the merge commit message, we can put in whatever message we want, which may be the same "Merge branch 'master' ..." thing, or we can write our own more-meaningful message, such as "re-merge without huge file".

Let's draw part of this result:

* xxxxxxx (HEAD -> repairwork) "re-merge without huge file" 
|\
* | xxxxxxx once
* | xxxxxxx once

We're now at the point where we can create a corrected v0.8 branch.

All we have to do now is git checkout -b v0.8-fixed (it needs a different name, v0.8 is already in use) and then git cherry-pick v0.8 or git cherry-pick 0463323. Either cherry-pick command does the same thing: we're just resolving the name, v0.8, to the target commit. Once we've finished the cherry-pick, we are done with the old, broken v0.8, so we can rename it and rename our corrected one v0.8:

git checkout -b v0.8-fixed       # make new branch
git cherry-pick v0.8             # copy one commit to it
git branch -m v0.8 v0.8-broken   # rename broken branch
git branch -m v0.8               # rename our branch

If we git log --graph --decorate --oneline --all now, it starts like this:

* xxxxxxx (HEAD -> v0.8) branch pushed to remote
* xxxxxxx (repairwork) "re-merge without huge file" 
|\
* | xxxxxxx once
* | xxxxxxx once

It should now be possible to push v0.8 to the remote. This still has four commits to transfer, but none of these four have the huge file.

We can also delete the old test branch now (git branch -D test) and make test point to the current commit (git branch test).

Note that the huge file is still in our repository:

  • It's under v0.8-broken, which has that chain of four commits, at least one of which has the huge file.

    We can simply forcibly delete v0.8-broken once we're sure we are done with it, i.e., once the "fixed" v0.8 is pushed and all looks good to everyone.

  • It's also is also underneath master, though, as we have not yet repaired master: one of master's parents is 4b6d7c8 Merge branch 'master' of https://github.com/X/X and that particular commit has 631d55a once as one of its parents, and 631d55a and/or 4aa7275 have the huge file.

We can repair master by this same process, namely making new "good" or "repair" branches, then copying commits and/or re-doing the merges. Making a new branch will lose the current master upstream setting (though that's easily fixed as well). There is a shortcut to repairing master though, due to the fact that there is just the one merge to re-do. We can get onto master, hard-reset it to a good commit, then re-do the merge:

git checkout master
git reset --hard <some commit>
git merge <another commit>

When we do this we have our choice of which commit to hard-reset-to, and which one to merge. The merge result has, as its first parent, the commit that we hard-reset-to. Its second parent is whatever commit we name in the git merge command.

In your original sequence, the first parent is the other merge, and the second is origin/master. This may be what you want, although it has been nicknamed a "foxtrot merge" and it is often the wrong way around. (It's what you get from using git pull, and git pull is usually the wrong thing to do, for reasons described in that other question and its links.)

(Original answer below line.)


As I noted in a comment on your other question, git push works by identifying which commits you have in common with the remote you're pushing to, and which commits you have that they don't.1 In this case the remote is named origin. We cannot tell which commits you and they have in common, and which ones you have that they don't, from this:

git push --set-upstream origin v0.8

but you can. We'll get to that in a moment. First, here's the same background information as in the comment I made, but in more detail.

Your git push command needs to send the commit (or annotated tag object) to which v0.8 resolves (I am guessing this is the 046332334e1f944f64a110f92434cdc26e9fafd0 you are showing, although you have not shown how you got this particular ID). Your git push sends this commit, plus whatever other commits, trees, and blobs are needed, and then asks their Git to set a branch or tag (it's not obvious which one this is) named v0.8 to point to that commit ID. You and they will then be in sync, at least with respect to this v0.8.

Somewhere associated with this set of commits that your Git will push, there is a Git tree with a very large file (or blob) object. Exactly which commit is something you will have to pin down and then do something about.

Here is an example of how such a thing comes about. Suppose, for instance, that you start in sync with the upstream repository. You then add, on an existing or new branch, a new commit, by doing something like this:

git add . && git commit -m 'add stuff'

In this "stuff" is that enormous file. Whoops, well, we can just remove it and commit again, right?

git rm bigfile && git commit -m 'rm 1.5 GB file'

If we tried to push at this point, the push would fail, because they (the remote, in this case GitHub) have something set up to detect and reject large files. We'll be pushing two commits: one that adds bigfile and a second one that deletes it. This means we have to push the big file itself, which takes forever because your data rate is limited (approximately 500 MiB at approximately 72 kiB/s = about 7111 seconds = about 118.5 minutes = nearly two hours).

Apparently it's not this particular point, though, because if it were, assuming your git diff-tree argument is correct, we'd see the removal of the big file in the diff-tree output. However, if we don't push yet, but instead go on to add still more commits, and then push, we will still have to push the enormous file: it's in one of those commits, and we have to push all of them: a commit is only valid if its ID matches the hash of all of its contents, and a commit's contents include the IDs of its parent commits, which include their parents, and so on, all the way back to the beginning of time.2 A repository must have all of the intermediate commits in order to have all the final commits.3

The trick, then, is to find the commit(s) that refer to the big file. Only you can do that because only you have the big file.

How to find the commit(s)

Here is how to list the commits your Git will push. Start by running git fetch origin to update your repository if needed—it's probably not needed, but it's usually worth doing anyway—and then run this command:

git log v0.8 --not --remotes=origin

(this is not quite perfect, as it ignores tags on origin, but at worst this will list too many commits, not too few).

The idea here is simple: your remote-tracking branches record every commit they have on every branch that they have. (This is why we ran git fetch, to get this information updated.) You have some commit(s) on v0.8 that they do not. We use v0.8 to select every commit that is on v0.8, but then add --not --remotes=origin to de-select every commit that is on any origin/* remote-tracking branch. (This is where the error creeps in: we should also exclude commits they have on tags they have, but we cannot easily tell which tags they have, at this point. If Git kept "remote tags", instead of stuffing them all into a single namespace, we could fix that here.)

Whatever is left, is probably a commit we have to push, so we can git log those. Add -m -p --name-status to get a name-and-status diff of every commit (including pesky merge commits, which git log normally skips diff-ing; this is the -m flag).

We have even more information, though, so it's very likely you don't need to do that. Let's take a look at what your Git and GitHub's Git talked through:

Counting objects: 180, done.

From this, we know that after your Git and their Git had their conversation to determine which commits, trees, blobs, and annotated-tag objects you had, that they didn't, that your Git would have to send, your Git had 180 such objects.

Delta compression using up to 4 threads.
Compressing objects: 100% (92/92), done.

Your Git was able to compress 92 of those objects against objects that your Git knows that their Git has, or against objects your Git was sending, by virtue of the fact that if their Git has a commit, it also has every tree and blob that go with that commit, and every commit, tree, and blob in all of the history of that commit, back to the beginning of time. (See footnote 2 again.)

Writing objects: 100% (180/180), 538.00 MiB | 72.00 KiB/s, done.
Total 180 (delta 142), reused 110 (delta 87)

All 180 objects made it across. I'm not sure off-hand what the additional numbers really mean (just that they come from git pack-objects --fix-thin).

remote: error: GH001: Large files detected. You may want to try ...
remote: error: Trace: eef60ca4521006cb11e4b7f181bc7a1a
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File X.sql is 1537.98 MB; this exceeds ...

All of these messages prefixed with remote: come from scripts that their Git runs. One of the things GitHub does is (obviously) to scan incoming commits for large files. It found one such, this X.sql at 1.5 GB (which compressed to 1/3 of its size since your Git only had to send a mere 0.5 GB :-) ).

One of them says trace: and prints a Git hash value.

I cannot find any specifics on what this trace message is showing, but for it to be directly useful, it should be the commit ID.

You can test this for yourself:

git cat-file -t eef60ca4521006cb11e4b7f181bc7a1a

will show the type of the object in question (if it is a valid object). If it turns out to be a blob or tree, rather than a commit, then the reason it's not documented is that it's kind of useless—not that we cannot find a commit containing a specific tree or blob, but that they had the most-useful bit of information right there, but gave us less-useful information instead.

If it is the commit ID, look at that particular commit (git log -1 eef60ca4521006cb11e4b7f181bc7a1a, for instance). Then use something like git rebase -i to amend that commit, or to squash it commit together with a commit that removes the large file. Since the large file is not in the end-point commit, you have a removal commit in there already; whether it's suitable for squashing depends on the commit, and what you want to have show up in the commit history you present to the rest of the world by pushing.

Just for completeness:

To https://github.com/X/X.git
  ! [remote rejected] v0.8 -> v0.8 (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/X/X.git'

This tells us that the large-file-rejection happens in a pre-receive hook, and that you were pushing via https. The v0.8 on the left is your name and the v0.8 on the right is theirs. Git does not distinguish between branch and tag push failures even when explicitly pushing tags:

$ git push origin refs/tags/derp2
Total 0 (delta 0), reused 0 (delta 0)
remote: pre receive hook
remote: found tag
To [redacted]
 ! [remote rejected] derp2 -> derp2 (pre-receive hook declined)
error: failed to push some refs to '[redacted]'

although successes are reported as new tag. (I set up a test pre-receive hook that simply rejects all tags, to check this).


1More precisely, your Git gets a list of names (branches, tags, and other references) and object IDs from their Git. These could, in general, be any type of object. Branch names, however, can only point to commits; tag names normally point to either an annotated tag, or directly to a commit. I have played with manually tagging blobs and trees, and this does work, but it's not normal.

2This structure, where non-leaf nodes of a tree carry hash values of their children, is called a hash tree or Merkle tree. In version control systems like Git and Mercurial, the commit graph is a DAG with its parent/child relationships reversed so that commits can be read-only, but the theory still applies.

3A shallow repository is one in which this rule is relaxed. Shallow repositories are by definition not authoritative, since their Merkle trees cannot be verified. Git's implementation currently only allows shallow repositories to work in the "fetch" direction (the Git doing the fetching gets correct parent IDs for each "uprooted" commit, but then stubs off the graph with a special graft entry to make it act as if it were a root commit). The sender and receiver must both cooperate to make this work.

Upvotes: 2

Related Questions