Reputation: 1039
I have a number of shapefiles in my repo that were too large, thereby causing my push to GitHub to fail. I initially tried to create a .gitignore
file that excludes most of the extensions in shapefile bundles. It still tried to push the shapefiles. After some searching, I found I had to clear the cache:
git rm -rf --cached .
git add .
However, once I tried to commit and then push again, I found that this did not fix the problem. The same shapefile was hanging things up. After much messing around, I abandoned the idea and decided to move all the shapefiles out of the repo. I cleared the cache again, added back, committed, and attempted to push to GitHub.
The push failed. The shapefile (which is no longer in the repo) was too large for a push. How can that happen? I feel like files that are not in the commit, because they aren't in the repo, should not be able to hang up the push. Any thoughts on what is happening here would be most appreciated.
UPDATE: Current status of rebase options...
noop
# Rebase 133c6ec..133c6ec onto 133c6ec
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
UPDATE: Reflog >> it all starts with 'Adding many images'
133c6ec HEAD@{0}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{1}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{2}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{3}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{4}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{5}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{6}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{7}: rebase -i (pick): still dealing with shp bs
0f81c71 HEAD@{8}: rebase -i (pick): Removing shapefiles
91cb472 HEAD@{9}: rebase -i (pick): Adding comments from Mullins consult - throu
83c1269 HEAD@{10}: rebase -i (pick): Adding comments from Mullins consult - thro
7677b3f HEAD@{11}: rebase -i (pick): Hopefully .gitignore is now working
97aa005 HEAD@{12}: rebase -i (pick): Adjusting gitignore
9e912cb HEAD@{13}: rebase -i (pick): Adjusting gitignore
06647c0 HEAD@{14}: rebase -i (squash): Adding many images
259d73b HEAD@{15}: rebase -i (squash): # This is a combination of 2 commits.
3b2d5e8 HEAD@{16}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{17}: rebase: aborting
7bc98a4 HEAD@{18}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{19}: rebase -i (finish): returning to refs/heads/master
a585f1d HEAD@{20}: rebase -i (start): checkout 9f28970
a585f1d HEAD@{21}: rebase -i (finish): returning to refs/heads/master
a585f1d HEAD@{22}: rebase -i (start): checkout refs/remotes/origin/master
:...skipping...
133c6ec HEAD@{0}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{1}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{2}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{3}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{4}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{5}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{6}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{7}: rebase -i (pick): still dealing with shp bs
0f81c71 HEAD@{8}: rebase -i (pick): Removing shapefiles
91cb472 HEAD@{9}: rebase -i (pick): Adding comments from Mullins consult - throu
83c1269 HEAD@{10}: rebase -i (pick): Adding comments from Mullins consult - thro
7677b3f HEAD@{11}: rebase -i (pick): Hopefully .gitignore is now working
97aa005 HEAD@{12}: rebase -i (pick): Adjusting gitignore
9e912cb HEAD@{13}: rebase -i (pick): Adjusting gitignore
06647c0 HEAD@{14}: rebase -i (squash): Adding many images
259d73b HEAD@{15}: rebase -i (squash): # This is a combination of 2 commits.
3b2d5e8 HEAD@{16}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{17}: rebase: aborting
7bc98a4 HEAD@{18}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{19}: rebase -i (finish): returning to refs/heads/master
a585f1d HEAD@{20}: rebase -i (start): checkout 9f28970
a585f1d HEAD@{21}: rebase -i (finish): returning to refs/heads/master
a585f1d HEAD@{22}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{23}: rebase: aborting
:...skipping...
133c6ec HEAD@{0}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{1}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{2}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{3}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{4}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{5}: rebase -i (start): checkout refs/remotes/origin/master
133c6ec HEAD@{6}: rebase -i (finish): returning to refs/heads/master
133c6ec HEAD@{7}: rebase -i (pick): still dealing with shp bs
0f81c71 HEAD@{8}: rebase -i (pick): Removing shapefiles
91cb472 HEAD@{9}: rebase -i (pick): Adding comments from Mullins consult - through rev chapter
83c1269 HEAD@{10}: rebase -i (pick): Adding comments from Mullins consult - through rev chapter
7677b3f HEAD@{11}: rebase -i (pick): Hopefully .gitignore is now working
97aa005 HEAD@{12}: rebase -i (pick): Adjusting gitignore
9e912cb HEAD@{13}: rebase -i (pick): Adjusting gitignore
06647c0 HEAD@{14}: rebase -i (squash): Adding many images
259d73b HEAD@{15}: rebase -i (squash): # This is a combination of 2 commits.
3b2d5e8 HEAD@{16}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{17}: rebase: aborting
7bc98a4 HEAD@{18}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{19}: rebase -i (finish): returning to refs/heads/master
a585f1d HEAD@{20}: rebase -i (start): checkout 9f28970
a585f1d HEAD@{21}: rebase -i (finish): returning to refs/heads/master
a585f1d HEAD@{22}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{23}: rebase: aborting
eaadebf HEAD@{24}: rebase -i (pick): Adding comments from Mullins consult - through rev chapter
7bc98a4 HEAD@{25}: rebase -i (start): checkout refs/remotes/origin/master
a585f1d HEAD@{26}: commit: still dealing with shp bs
4bef02c HEAD@{27}: commit: Removing shapefiles
cc061ac HEAD@{28}: commit: Adding comments from Mullins consult - through rev chapter
21c5ab7 HEAD@{29}: commit: Adding comments from Mullins consult - through rev chapter
9f28970 HEAD@{30}: commit: Hopefully .gitignore is now working
a2bdbae HEAD@{31}: commit: Adjusting gitignore
c3e5128 HEAD@{32}: commit: Adjusting gitignore
8f8b96e HEAD@{33}: commit: Adding gitignore to avoid tracking shapefiles
0c14e14 HEAD@{34}: commit: Adding gitignore to avoid tracking shapefiles
3b2d5e8 HEAD@{35}: commit: Adding many images
Upvotes: 3
Views: 2481
Reputation: 490178
The first thing to remember here is that git push
pushes commits. If any files happen to be included, that's purely a matter of being needed for the commits that are being pushed.
The second thing to remember is that when you do git push
, your git and "their" git (a git command running on github, in this case) generally have a little talk with each other, of the form:
"I have commit <SHA-1>
I'd like to give you, and then I'd like you to set your branch master
(or whatever other branch) to point to that SHA-1."
"Well, I have <different SHA-1>
for that branch. Tell me what SHA-1s I need to fill in any holes between what I have and what you have." (There's more to it than this, and it goes in a different order, but the essence is an exchange of who-has-what commit and other object IDs.)
Once they know what IDs each other has, the sender packages up "whatever is needed": this is a series of commits (possibly empty) and any entities—mainly (but not quite limited to) files—that go with those commits, that the receiver also does not already have. In this case the sender is your git, the receiver is their git, and the package includes some large file(s).
You decided you wanted not to send the large files. This means you must replace the commit(s) you're asking your git to send, with some new commit(s) you will ask your git to send. If the new commit(s) do not refer to the large files, then when your git goes to send the commits, their git will not ask for the files either, and your git won't send them.
The Pro Git book has a section on "rewriting history" that covers this fairly well. What's missing (at least if you just read the one section, there are other sections that cover this) is a diagram of what a rebase really does.
(Incidentally, your git repo will still contain the large files, and will continue to do so until all references to those files are gone, including the sort of ghost references that linger in git's "reflogs" after history-rewrite operations. It's these lingering ghosts that allow you to resurrect files if you make a mistake during the history rewrite. Reflog entries persist for 30 days by default, for these entries at least—"more active" reflog entries persist for 90 days by default—but unless you're doing something unusual you can normally just let them expire on their own.)
The git rebase
documentation has some diagrams, such as this one:
A---B---C topic
/
D---E---F---G master
[becoming]
A'--B'--C' topic
/
D---E---F---G master
The individual letters stand for commits, and the reason this has A'
instead of A
, and so on, after the rebase, is that rebase doesn't—can't—actually move a commit, it can only make a copy. The original commits are still in there, they just don't have a label like topic
keeping them visible. If the copies are different—and they are—then they have a new, different SHA-1. It's the SHA-1s that really matter, at least during push (and fetch).
In your case, what you want to do when rebasing is to make "deliberately flawed copies", where the originals have the large files, and the "flawed" copies don't. (In fact, of course, it's "having the large file" that is the flaw, so the not-quite-perfect-copy copies are the right ones, it's the originals that are wrong!)
Interactive rebase has the additional ability to "squash" a new commit into an existing commit, i.e., to take the copy it's going to make and modify it based on the next commit in the sequence.
The other big difference between what you want to do, and what is in the diagram above, is that you want the new commit(s) to start from the same point as the originals:
H - I <-- master [in your repo]
/
... - G <-- origin/master [i.e., what's on github as master]
Here commit H
might be the flawed one with the extra file(s), and I
might be the commit that removes the extra files. If you ask to push your master
to github and have github set that as its master
, your git and their git will chat and decide that github needs H
and I
and some files—including, because of H
, the big ones.
If you rewrite your own history so that in place of H
and I
you have one single new H'I'
commit—let's just call this J
for simplicity—then you'll have this diagram instead:
H - I [abandoned ghosts]
/
... - G <-- origin/master [i.e., what's on github as master]
\
J <-- master
Now you can have your git call up github's git and propose sending just J
, which does not have the big files in it.
Note that there are two keys to all of this:
Commit G
(as pointed-to by origin/master
) does not have the big files, but is in both your repository and the repository on github. It's the shared starting point for the push
: it's the first commit your side leaves out when pushing, because their side already has it.
Commit J
(and/or any other commits you will push) must also not have the big files. That way, when your git talks with their git, your git will decide that what it needs to send does not include the big files.
In the end, it doesn't matter how many commits your side will send over in the push, what matters is what's in those specific commits. You can rewrite stuff that "only you have" as often (or rarely) as you like, to however you like it. Once you've successfully given those commits over to another repository, if you "rewrite" them to new slightly-different copies, the other repo still has the originals and other users may have gotten them as well (if that other repository is generally accessible).
(You can still "rewrite history" and use a --force
push to ask the other repo to discard some commit(s), including originals that you've decided are bad. There's nothing inherently wrong with this either, it's just that anyone collaborating with you may have picked up those "wrong originals" and they may be using them, so you're making more work for those people too.)
One last note, on the fact that git rebase -i
is showing an empty list: this suggests you've actually removed all the commits you had that they didn't. That is, instead of going from:
H - I <-- master
/
... - G <-- origin/master
to:
J <-- master
/
... - G <-- origin/master
you've somehow simply discarded H
and I
entirely, so that your master
points to commit G
too.
This could happen if you did a rebase -i
and told git to squash
I
into H
, and it did, and the result was exactly the same files as are in commit G
. (For instance, if the only difference from G
to H
is "add big-file" and the only difference from H
to I
is "remove big-file", the combination of the two has no difference from G
.) Git does allow an "empty" commit—a commit with author, message, etc., as usual, but with the same tree as the previous commit—but by default, rebase
assumes you don't want that: it just strips out the canceled-out commits entirely.
If you did have other commits that have vanished, those "ghost commits" I mentioned earlier are just what you need. To find them, look in the "reflogs":
$ git reflog
and:
$ git reflog master
These reflogs keep a history of where HEAD
and master
(or any other branch) have pointed over the last up-to-90-days: the raw SHA-1 IDs of both commits, whether they're regular commits that will stick around forever, or lingering ghost commits retained only by the reflog entries.
Upvotes: 3