Juan David
Juan David

Reputation: 2797

.gitignore not ignoring at all

I've created a directory called Exploratory Data Analysis/Course Project 1/ with the file household_power_consumption.txt inside. I tried to push it to my git repo for this project and received a warning about the size of household_power_consumption.txt so I added a .gitignore file with this line:

Exploratory\ Data\ Analysis/Course\ Project\ 1/household_power_consumption.txt

And I tried to follow the solutions posted here, here and here but none of these works for me. I always receive the same error:

 git push origin master
Counting objects: 31, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (28/28), done.
Writing objects: 100% (31/31), 20.48 MiB | 221.00 KiB/s, done.
Total 31 (delta 6), reused 0 (delta 0)
remote: error: GH001: Large files detected.
remote: error: Trace: 491a8219bf1d3de4fd08a8e3ea253faa
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File Exploratory Data Analysis/Course Project 1/household_power_consumption.txt is 126.80 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/jd901215/DataScience_CourseraSpecialization.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/jd901215/DataScience_CourseraSpecialization.git'

This is the local directory tree for my project:

├── Exploratory Data Analysis
│   └── Course Project 1
│       └── household_power_consumption.txt
├── .git
│   ├── branches
│   ├── COMMIT_EDITMSG
│   ├── config
│   ├── description
│   ├── FETCH_HEAD
│   ├── HEAD
│   ├── hooks
│   │   ├── applypatch-msg.sample
│   │   ├── commit-msg.sample
│   │   ├── post-update.sample
│   │   ├── pre-applypatch.sample
│   │   ├── pre-commit.sample
│   │   ├── prepare-commit-msg.sample
│   │   ├── pre-push.sample
│   │   ├── pre-rebase.sample
│   │   └── update.sample
│   ├── index
│   ├── info
│   │   └── exclude
│   ├── logs
│   │   ├── HEAD
│   │   └── refs
│   │       ├── heads
│   │       │   └── master
│   │       └── remotes
│   │           └── origin
│   │               └── master
│   ├── objects
|   |   ├── (Bunch of SHA-1 checksums I guess )
│   │   ├── info
│   │   └── pack
│   ├── ORIG_HEAD
│   └── refs
│       ├── heads
│       │   └── master
│       ├── remotes
│       │   └── origin
│       │       └── master
│       └── tags
├── .gitignore
├── LICENSE
├── README.md
└── R programming 
    ├── README.md
    ├── Week 1 Programming assignment
    │   ├── complete7.R
    │   ├── complete.R
    │   ├── corr.R
    │   ├── pollutantmean.R
    │   ├── .RData
    │   ├── .Rhistory
    │   └── specdata
    ├── Week 2 Programming assignment
    │   ├── cachematrix.R
    │   └── README.md
    └── Week 4 Programming assignment
        ├── best.R
        ├── hospital-data.csv
        ├── Hospital_Revised_Flatfiles.pdf
        ├── outcome-of-care-measures.csv
        ├── .Rhistory
        └── rprog-doc-ProgAssignment3.pdf

Sorry for the long post but I'm trying to offer the necessary info. Thanks in advance

Upvotes: 0

Views: 893

Answers (2)

torek
torek

Reputation: 487755

First, it's worth noting that .gitignore entries don't mean what people usually think they mean, at first. I'll get back to this (much) later.

It's obvious1 that this output line:

remote: error: Trace: 491a8219bf1d3de4fd08a8e3ea253faa

is telling you which commit, in the set of commits you're trying to push, has the overly large file in it.

To get here, though, you have to know a bunch of things about git that are often poorly-explained in various documents. (For a good explanation, see the Git Book.) In this particular case, one thing to know is that git push calls up a "remote", another computer that has its own separate git repository on it, and then your git asks that other git to take any of your new commits and add them to its own repository.


In this case, you and your git call the remote "origin", which is the standard name for "the place I cloned from originally".

For illustration, here's what happens with a simple case of cloning an origin in which there are just three commits, on branch master:

their git: C1 <- C2 <- C3 <-- master

you: $ git clone <url>

your git: (uses Internet-phone to call their git) "Here's a URL, what do you have?"

their git: "I have master which points to commit C3"

your git: "OK, gimme, and oh, I see I also need C2 and C1"

their git: (gives your git a bundle of everything)

your git: "kthxbye!" (unpacks everything, creates new repo that's the same as what you got from their git, plus the remote-name "origin" and the url)

Now that you have the clone, you do some work and make some commit(s). Let's call them C4 and C5. Each of these C<digit>s stands in for one of those big ugly 40-character SHA-1s, like 491a8219bf1d3de4fd08a8e3ea253faa. Each commit "points back" at its parent commit, so C3 points to C2 and C2 points to C1. (Since C1 is the first commit, it has no outbound arrow: a commit points to zero-or-more parents, and the initial commit is the one with zero.) Let's draw in C4 and C5 now:

C1 <- C2 <- C3 <- C4 <- C5   <-- master

Note that the branch label master now points to commit C5, no longer to C3.

(Your git will have created an additional label, origin/master, pointing to C3. This is how your git can tell that you are "ahead 2" commits. Since your git is not always on the Internet-phone keeping up to date with their git, though, this information can get stale. But we'll ignore that for now.)

Eventually, you decide to:

$ git push origin master

This tells your git to get back on the Internet-phone to their git. The conversation now is a bit different:

your git: "I have some stuff for you."

their git: "Hm, well, gimme the stuff and I'll see. I have master and it's commit C3."

your git: "OK, since you're on C3, here's my C4 and C5."

their git: (starts doing checking, in this case, running a "pre-receive hook")

The "pre-receive hook" is arbitrary code that they (whoever they are) wrote. It can do whatever they want it to do, but in this case, it obviously (there's that word again :-) ) checks the new commits you're handing over. At least one of those two commits obviously has, in its associated source tree, a large file.

This arbitrary code could just say "no", but it's printing specifics about why it's saying "no". The sensible thing to print is the identity of the commit that has the large file, because commit IDs are unique, even across the Internet-phone, so you will have that same ID in your repository and can use it to see where things went wrong.

In this example, that's either commit C4 or commit C5. (The large file could be in both commits; the remote might stop after finding whichever one it finds first, not telling you about the second. But in any case it's just telling you about one commit.)

To see that particular commit:

$ git log 491a8219bf1d3de4fd08a8e3ea253faa

or:

$ git show 491a8219bf1d3de4fd08a8e3ea253faa

Something else that may help make it easier to see is to use:

$ git log --name-status

which shows you files added, deleted, or changed (by comparing each commit to its parent-commit).


Now, the real question is how to deal with this. In this particular case it's probably easiest to use an interactive rebase, as shown in the Rewriting History section of the Git Book. What you need to do here is to replace the original "bad" commit(s) with "good" commits.

(Git doesn't ever actually replace commits, it only adds new commits, but once you have the new ones, you and your git can pretend the old ones don't exist. Eventually—by default, in 30 days or so—the old forgotten-about commits "expire" and will be removed, but they're preserved until then. This preservation means you can always recover from any mistakes made during the interactive rebase.)

Let's say, for illustration, that the error happened in commit C4 (if it happened in C5 it's a lot easier to fix, and this whole interactive-rebase thing is not required, although it will still work). We're further going to say—this seems reasonable since your git push is failing—that you have not pushed C4 and C5 anywhere else yet. (If you have, life gets more painful for whoever got your current C4 and C5 commits.)

You start by running git rebase -i. Since you are on master and its "upstream" branch (as git calls it) is origin/master, this means "rebase master onto origin/master": that is, copy commits C4 and C5 onto the tip of origin/master to make new, copied-but-slightly-changed commits C4' and C5':

                 C4 <- C5  <-- [master, before rebase]
               /
C1 <- C2 <- C3             <-- origin/master
               \
                 C4' ...   <-- temporary rebase branch

When the rebase commands come up in your editor, change the pick line for commit C4 to an edit line. Write out the instructions and exit the editor. (You don't need to do anything for C5, since it's going to be the tip-most commit once the rebase finishes. If you need to change it you can simply git commit --amend it.)

The interactive rebase will cherry-pick the original commit, but then stops at this point, when commit C4' is actually the same as C4 (except sometimes for some time stamps and other author/committer meta-data). You can now "amend" the commit.

Since the commit adds a file you don't want, you need to git rm it first, perhaps with git rm --cached to leave the file in your work-tree:

$ git rm --cached Exploratory\ Data\ Analysis/Course\ Project\ 1/household_power_consumption.txt

If it seems necessary or appropriate, you might also add that path to a .gitignore file, and git add the .gitignore. This file only buys you a bit of convenience, so it's up to you whether to do that.

Now you're ready to use git commit --amend to make the modified commit C4' permanent, before continuing the rebase:

$ git commit --amend --no-edit
[master d106870] blah blah ...
$ git rebase --continue

(The --no-edit flag skips opening the editor on the commit message. If you want to make a change to the commit message too, leave it out.)

The continue step simply finishes off the rebase by copying commit C5, adding it to the tip of your temporary branch:

                 C4 <- C5  <-- [master, before rebase]
               /
C1 <- C2 <- C3             <-- origin/master
               \
                 C4' - C5' <-- temporary rebase branch

and then—as all rebases do—erasing the label master and pasting it onto the temporary branch:

C1 <- C2 <- C3             <-- origin/master
               \
                 C4' - C5' <-- master

Now that you have a set of commits that don't add the large file, you can re-run git push origin master. Your git and their git will re-do their conversation, but this time, your git will hand over commits C4' and C5', which omit the big file. Your git will then ask their git to label C5' as branch master, and presumably this time, it will be OK.


Back to the .gitignore file: I sometimes think this file is mis-named, because these are not files that git ignores. In particular, if a file is already tracked—is in the index—git won't ignore it. So if you git add-ed a file by mistake, committed that, then put it in .gitignore, it's still in the earlier commit. This is true even if you git rm-ed it since then.

(Entries in .gitignore do two things, and often the more-important one is "keep git from griping about this file being untracked", so that instead of .gitignore, a better name might be .git-shut-up-with-your-untracked-noise, or some such. But entries in this file also keep git add from adding the file to the index, if it's not already in the index, so it also needs the name .git-dont-add-files or something like that. So, we just use .gitignore and have to understand that it doesn't really mean "ignore".)


1"Obvious" meaning "not obvious at all, except after you've written some pre-receive hooks."

Upvotes: 1

MadCoder
MadCoder

Reputation: 641

Worked for me when I added the line

Exploratory\ Data\ Analysis/Course\ Project\ 1/household_power_consumption.txt

to .gitignore

I suspect you are unable to push because of your previous commit. Though you are removing it locally, I think a commit was created previously with that file. Check 'git log' and see whether the commit in which you checked in the txt file is there.

If so, execute

git reset --soft HEAD^

to remove that commit (if it is the latest commit).

After resetting the commit, in 'git status', if you see the file in:

    "Changes to be committed:" area, execute "git reset HEAD <file>".
    "Changes not staged for commit:" area, execute "git checkout -- <file>".

Do 'git status' and check that the file is in "Untracked files:" area.

Then, add the file to .gitignore. It'll work.

Upvotes: 1

Related Questions