Reputation: 2797
I've created a directory called Exploratory Data Analysis/Course Project 1/
with the file household_power_consumption.txt
inside. I tried to push it to my git repo for this project and received a warning about the size of household_power_consumption.txt
so I added a .gitignore
file with this line:
Exploratory\ Data\ Analysis/Course\ Project\ 1/household_power_consumption.txt
And I tried to follow the solutions posted here, here and here but none of these works for me. I always receive the same error:
git push origin master
Counting objects: 31, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (28/28), done.
Writing objects: 100% (31/31), 20.48 MiB | 221.00 KiB/s, done.
Total 31 (delta 6), reused 0 (delta 0)
remote: error: GH001: Large files detected.
remote: error: Trace: 491a8219bf1d3de4fd08a8e3ea253faa
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File Exploratory Data Analysis/Course Project 1/household_power_consumption.txt is 126.80 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/jd901215/DataScience_CourseraSpecialization.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/jd901215/DataScience_CourseraSpecialization.git'
This is the local directory tree for my project:
├── Exploratory Data Analysis
│ └── Course Project 1
│ └── household_power_consumption.txt
├── .git
│ ├── branches
│ ├── COMMIT_EDITMSG
│ ├── config
│ ├── description
│ ├── FETCH_HEAD
│ ├── HEAD
│ ├── hooks
│ │ ├── applypatch-msg.sample
│ │ ├── commit-msg.sample
│ │ ├── post-update.sample
│ │ ├── pre-applypatch.sample
│ │ ├── pre-commit.sample
│ │ ├── prepare-commit-msg.sample
│ │ ├── pre-push.sample
│ │ ├── pre-rebase.sample
│ │ └── update.sample
│ ├── index
│ ├── info
│ │ └── exclude
│ ├── logs
│ │ ├── HEAD
│ │ └── refs
│ │ ├── heads
│ │ │ └── master
│ │ └── remotes
│ │ └── origin
│ │ └── master
│ ├── objects
| | ├── (Bunch of SHA-1 checksums I guess )
│ │ ├── info
│ │ └── pack
│ ├── ORIG_HEAD
│ └── refs
│ ├── heads
│ │ └── master
│ ├── remotes
│ │ └── origin
│ │ └── master
│ └── tags
├── .gitignore
├── LICENSE
├── README.md
└── R programming
├── README.md
├── Week 1 Programming assignment
│ ├── complete7.R
│ ├── complete.R
│ ├── corr.R
│ ├── pollutantmean.R
│ ├── .RData
│ ├── .Rhistory
│ └── specdata
├── Week 2 Programming assignment
│ ├── cachematrix.R
│ └── README.md
└── Week 4 Programming assignment
├── best.R
├── hospital-data.csv
├── Hospital_Revised_Flatfiles.pdf
├── outcome-of-care-measures.csv
├── .Rhistory
└── rprog-doc-ProgAssignment3.pdf
Sorry for the long post but I'm trying to offer the necessary info. Thanks in advance
Upvotes: 0
Views: 893
Reputation: 487755
First, it's worth noting that .gitignore
entries don't mean what people usually think they mean, at first. I'll get back to this (much) later.
It's obvious1 that this output line:
remote: error: Trace: 491a8219bf1d3de4fd08a8e3ea253faa
is telling you which commit, in the set of commits you're trying to push, has the overly large file in it.
To get here, though, you have to know a bunch of things about git that are often poorly-explained in various documents. (For a good explanation, see the Git Book.) In this particular case, one thing to know is that git push
calls up a "remote", another computer that has its own separate git repository on it, and then your git asks that other git to take any of your new commits and add them to its own repository.
In this case, you and your git call the remote "origin", which is the standard name for "the place I cloned from originally".
For illustration, here's what happens with a simple case of cloning an origin in which there are just three commits, on branch master
:
their git:
C1 <- C2 <- C3 <-- master
you:
$ git clone <url>
your git: (uses Internet-phone to call their git) "Here's a URL, what do you have?"
their git: "I have
master
which points to commitC3
"your git: "OK, gimme, and oh, I see I also need
C2
andC1
"their git: (gives your git a bundle of everything)
your git: "kthxbye!" (unpacks everything, creates new repo that's the same as what you got from their git, plus the remote-name "origin" and the url)
Now that you have the clone, you do some work and make some commit(s). Let's call them C4
and C5
. Each of these C<digit>
s stands in for one of those big ugly 40-character SHA-1s, like 491a8219bf1d3de4fd08a8e3ea253faa. Each commit "points back" at its parent commit, so C3
points to C2
and C2
points to C1
. (Since C1
is the first commit, it has no outbound arrow: a commit points to zero-or-more parents, and the initial commit is the one with zero.) Let's draw in C4
and C5
now:
C1 <- C2 <- C3 <- C4 <- C5 <-- master
Note that the branch label master
now points to commit C5
, no longer to C3
.
(Your git will have created an additional label, origin/master
, pointing to C3
. This is how your git can tell that you are "ahead 2" commits. Since your git is not always on the Internet-phone keeping up to date with their git, though, this information can get stale. But we'll ignore that for now.)
Eventually, you decide to:
$ git push origin master
This tells your git to get back on the Internet-phone to their git. The conversation now is a bit different:
your git: "I have some stuff for you."
their git: "Hm, well, gimme the stuff and I'll see. I have
master
and it's commitC3
."your git: "OK, since you're on
C3
, here's myC4
andC5
."their git: (starts doing checking, in this case, running a "pre-receive hook")
The "pre-receive hook" is arbitrary code that they (whoever they are) wrote. It can do whatever they want it to do, but in this case, it obviously (there's that word again :-) ) checks the new commits you're handing over. At least one of those two commits obviously has, in its associated source tree, a large file.
This arbitrary code could just say "no", but it's printing specifics about why it's saying "no". The sensible thing to print is the identity of the commit that has the large file, because commit IDs are unique, even across the Internet-phone, so you will have that same ID in your repository and can use it to see where things went wrong.
In this example, that's either commit C4
or commit C5
. (The large file could be in both commits; the remote might stop after finding whichever one it finds first, not telling you about the second. But in any case it's just telling you about one commit.)
To see that particular commit:
$ git log 491a8219bf1d3de4fd08a8e3ea253faa
or:
$ git show 491a8219bf1d3de4fd08a8e3ea253faa
Something else that may help make it easier to see is to use:
$ git log --name-status
which shows you files added, deleted, or changed (by comparing each commit to its parent-commit).
Now, the real question is how to deal with this. In this particular case it's probably easiest to use an interactive rebase, as shown in the Rewriting History section of the Git Book. What you need to do here is to replace the original "bad" commit(s) with "good" commits.
(Git doesn't ever actually replace commits, it only adds new commits, but once you have the new ones, you and your git can pretend the old ones don't exist. Eventually—by default, in 30 days or so—the old forgotten-about commits "expire" and will be removed, but they're preserved until then. This preservation means you can always recover from any mistakes made during the interactive rebase.)
Let's say, for illustration, that the error happened in commit C4
(if it happened in C5
it's a lot easier to fix, and this whole interactive-rebase thing is not required, although it will still work). We're further going to say—this seems reasonable since your git push
is failing—that you have not pushed C4
and C5
anywhere else yet. (If you have, life gets more painful for whoever got your current C4
and C5
commits.)
You start by running git rebase -i
. Since you are on master
and its "upstream" branch (as git calls it) is origin/master
, this means "rebase master
onto origin/master
": that is, copy commits C4
and C5
onto the tip of origin/master
to make new, copied-but-slightly-changed commits C4'
and C5'
:
C4 <- C5 <-- [master, before rebase]
/
C1 <- C2 <- C3 <-- origin/master
\
C4' ... <-- temporary rebase branch
When the rebase commands come up in your editor, change the pick
line for commit C4
to an edit
line. Write out the instructions and exit the editor. (You don't need to do anything for C5
, since it's going to be the tip-most commit once the rebase finishes. If you need to change it you can simply git commit --amend
it.)
The interactive rebase will cherry-pick the original commit, but then stops at this point, when commit C4'
is actually the same as C4
(except sometimes for some time stamps and other author/committer meta-data). You can now "amend" the commit.
Since the commit adds a file you don't want, you need to git rm
it first, perhaps with git rm --cached
to leave the file in your work-tree:
$ git rm --cached Exploratory\ Data\ Analysis/Course\ Project\ 1/household_power_consumption.txt
If it seems necessary or appropriate, you might also add that path to a .gitignore
file, and git add
the .gitignore
. This file only buys you a bit of convenience, so it's up to you whether to do that.
Now you're ready to use git commit --amend
to make the modified commit C4'
permanent, before continuing the rebase:
$ git commit --amend --no-edit
[master d106870] blah blah ...
$ git rebase --continue
(The --no-edit
flag skips opening the editor on the commit message. If you want to make a change to the commit message too, leave it out.)
The continue
step simply finishes off the rebase by copying commit C5
, adding it to the tip of your temporary branch:
C4 <- C5 <-- [master, before rebase]
/
C1 <- C2 <- C3 <-- origin/master
\
C4' - C5' <-- temporary rebase branch
and then—as all rebase
s do—erasing the label master
and pasting it onto the temporary branch:
C1 <- C2 <- C3 <-- origin/master
\
C4' - C5' <-- master
Now that you have a set of commits that don't add the large file, you can re-run git push origin master
. Your git and their git will re-do their conversation, but this time, your git will hand over commits C4'
and C5'
, which omit the big file. Your git will then ask their git to label C5'
as branch master
, and presumably this time, it will be OK.
Back to the .gitignore
file: I sometimes think this file is mis-named, because these are not files that git ignores. In particular, if a file is already tracked—is in the index—git won't ignore it. So if you git add
-ed a file by mistake, committed that, then put it in .gitignore
, it's still in the earlier commit. This is true even if you git rm
-ed it since then.
(Entries in .gitignore
do two things, and often the more-important one is "keep git from griping about this file being untracked", so that instead of .gitignore
, a better name might be .git-shut-up-with-your-untracked-noise
, or some such. But entries in this file also keep git add
from adding the file to the index, if it's not already in the index, so it also needs the name .git-dont-add-files
or something like that. So, we just use .gitignore
and have to understand that it doesn't really mean "ignore".)
1"Obvious" meaning "not obvious at all, except after you've written some pre-receive hooks."
Upvotes: 1
Reputation: 641
Worked for me when I added the line
Exploratory\ Data\ Analysis/Course\ Project\ 1/household_power_consumption.txt
to .gitignore
I suspect you are unable to push because of your previous commit. Though you are removing it locally, I think a commit was created previously with that file. Check 'git log' and see whether the commit in which you checked in the txt file is there.
If so, execute
git reset --soft HEAD^
to remove that commit (if it is the latest commit).
After resetting the commit, in 'git status', if you see the file in:
"Changes to be committed:" area, execute "git reset HEAD <file>".
"Changes not staged for commit:" area, execute "git checkout -- <file>".
Do 'git status' and check that the file is in "Untracked files:" area.
Then, add the file to .gitignore. It'll work.
Upvotes: 1