Matan
Matan

Reputation: 117

Steps to use .gitignore and .git/info/exclude

I'm having trouble understanding how to use and write to a .gitignore, .git/config file and .git/info/exclude file.

I've read through this post and this documentation but its just not clicking for me.

my understanding is that: - .gitignore is to be used when you are okay with the file being pushed to git hub.
- use .git/info/exclude to keep files or folders from being pushed to git hub. So this is where you would hid secrets like API keys or put large folders of photos that do not need to be in git hub but just on my machine for deep learning training.
Is this not accurate?

How can I write to the .gitignore or .git/info/exclude files using command line or git bash? I've attempted many times by experimenting with syntax but am at a loss now. And I also cannot figure out how to open them through jupyter interface.

Upvotes: 1

Views: 2680

Answers (3)

Serge
Serge

Reputation: 12344

The main difference between .gitignore and info/exclude is that the first is versioned and can be checked in and checked out together with other files. There for it is persistent across multiple repositories.

info/exclude is not version controlled and affects only a single repository. You would need to physically copy it to another repository to have an effect there.

Both files are used to prevent git from tracking yet untracked files. They do not affect files which are already committed.

The core.excludeFile is yet another mechanism to possibly specify a system-wide ignore file.

Documentation specifies an order in which git considers the files: .gitingore, info/exlude, then core.excludeFile.

The example when you need the file could be build which generates .o files in the same worktree as the source files. They *.o pattern could be put in any of those files. It will make mos of the git commands to forget about existence of those files. They will not be listed in 'git status' or will not be added by 'git add'.

As such, the files can only be used to prevent from a adding untracked files into your local git repository. But as soon as you managed to commit the untracked files, nothing will prevent them from being pushed.

Upvotes: 1

torek
torek

Reputation: 487735

my understanding is that:

  • .gitignore is to be used when you are okay with the file being pushed to git hub.
  • use .git/info/exclude to keep files or folders from being pushed to git hub. [snip]

Is this not accurate?

This is not accurate. Assuming you understand just what untracked file means: what both .gitignore and .git/info/exclude do for you is to prevent accidentally tracking a file, and to shut Git up from complaining about the untracked files. That's pretty much it. Once a file is tracked, the .gitignore and .git/info/exclude files no longer affect it.

Files can become tracked and untracked at various times, though. There's a lot to know here.

Untracked files

The phrase untracked file has a very specific meaning in Git. I would point to the gitglossary here, but unfortunately untracked file is not defined there (!). So, I will use my own definition:

  • untracked file
    An untracked file is any file that is in the working tree right now, but is not in the index right now.

We'll come back to why I say right now twice, in a bit. First, though, this uses two more terms, index and working tree, that this time are defined in the gitglossary:

  • index
    A collection of files with stat information, whose contents are stored as objects. The index is a stored version of your working tree. ...

  • working tree
    The tree of actual checked out files. The working tree normally contains the contents of the HEAD commit’s tree, plus any local changes that you have made but not yet committed.

To get a good picture of all of this in your head, start with this: A Git repository is mainly composed of two databases. One holds all your commits and other Git objects (essentially, the rest of the stuff you need for a commit: file names and file contents). The other database, generally a lot smaller, holds your branch and tag name and other such names. The smaller database maps names to hash IDs.

You will have seen hash IDs in git log output:

commit 745f6812895b31c02b29bdfe4ae8e5498f776c26
Author: Junio C Hamano ...

(This particular commit is in the Git repository for Git, which you can clone from http://github.com/git/git/ if you like. Here is that commit; if you clone the repository, you will have this commit.)

Every Git object has its own hash ID. Each commit has a hash ID that is unique to that one particular commit. That hash ID can never be used for any other commit. Every other commit gets its own, different hash ID. A branch name, like master, holds one (1) hash ID, and to add commits to a branch, Git stuffs a new hash ID into the name master, so branch names change which hash ID they mean, over time.

Meanwhile, the hash ID for each commit is totally static and permanent. Once you have commit 745f6812895b31c02b29bdfe4ae8e5498f776c26, commit 745f6812895b31c02b29bdfe4ae8e5498f776c26 is always that commit. You either have it in your database of all-Git-objects, and then it's that commit, or you don't have it at all. Nothing about this commit can ever be changed. You either have it, and then it's exactly the same as the 745f6812895b31c02b29bdfe4ae8e5498f776c26 that I linked to above, or you don't have it.

Now, commits contain files—as a full snapshot—so if you have that particular commit, you have all of those files. But the files inside any commit are in a special, read-only / frozen, Git-only, format. I like to call them "freeze-dried". These freeze-dried files can be reconstituted any time you like, but the defrosted and rehydrated files are just copies. The originals are still inside the original commit.1

This means commits are great for archival—they keep copies of every version of every file, forever—but they're utterly useless for actually getting any new work done. You must reconstitute the files in order to use them, and to do that, Git puts the reconstituted files into your working tree, or work-tree, or any name along these lines.


1Technically, they're in additional Git objects, to which the commit merely points. Git is gradually acquiring a system by which they can be loaded on-demand rather than requiring that the commit come along with all of its dependencies, so someday you'll be able to have the commit without having the files too. But for now they might as well be directly in the commit, except for the fact that commits can share the freeze-dried copies of files, if they haven't changed from one commit to another.


The index

The working tree, or work-tree, defined above is pretty straightforward: Git's permanently-saved versions of files are in a special, read-only, Git-only format, so Git has to expand them out to ordinary, useful files, and put those into an area where you can see them and work with them. That's actually something pretty much every version control system does. Most just stop here—if there's anything more, the VCS keeps it hidden away. Git is not like the other version control systems. Git adds this thing it calls the index, and then promptly keeps shoving it in your face.

The index is also called the staging area, or sometimes—rarely these days—the cache. These are all just three names for the same thing. As the gitglossary says, it's a collection of files. In fact, initially, it's the files from the commit you checked out:

git checkout master

The name master is a branch name, so Git looks up the commit hash ID in its first database (of names to hash IDs). The checkout puts you on branch master and extracts that commit—whatever its hash ID is—so that you have all of its files in your work-tree. This extraction process obviously has to do:

  • read from commit
  • decompress / un-freeze / write to work-tree

If Git did just this—as other systems do—Git could use your work-tree, whatever changes you make to it, as your proposed next commit. When you run git commit, Git would have to look through your work-tree, re-compress everything, and see if it's changed from the previous commit or not. But instead, Git does this:

  • read from commit
  • write (frozen format, really hash ID) to index / staging-area
  • decompress / un-freeze / write to work-tree

Now when you run git commit Git can ignore your work-tree and just re-freeze straight from the index. If you made any changes in your work-tree, you must run:

git add whatever-file

to tell Git: take whatever-file from the work-tree and compress it down to the freeze-dried form and put that into the index, ready to be committed.

In other words, instead of your work-tree being your proposed next commit, it's actually your index that is your proposed next commit. You can change anything you want in your work-tree without affecting your next commit, because it's what is in your index / staging-area that matters.

Both the index and your work-tree are per-repository temporary areas

What really matters, in Git, is the commits. The big database, of all commits (and their files), is what Gits exchange between each other. The smaller database, of names like branch and tag names—Gits use that to send each other commits, and Gits will send each other their names (which you can override if you like) so that the other Git can use that name, if it likes, to remember the commits. But it's the commits that matter.

Your work-tree is yours, to do with as you will. Git doesn't really use it, with a few exceptions:

  • git checkout copies from commits, to the index and then your work-tree. (It has other modes that do more things. In some ways, it has too many modes, which is why in Git 2.23 there are two commands that, put together, can do what git checkout can do all in one command.)
  • git reset copies from commits to the index (and sometimes to the work-tree too). (Like git checkout, it has a lot of things it can do—perhaps too many.)
  • git add copies from your work-tree to the index.
  • git status looks at your work-tree. It looks first at the current commit, then at the index, and then at your work-tree.

It's at these last two commands—git add and git status—that .gitignore and friends start to become useful. But let's mention one more command:

  • git rm removes files from both the index and your work-tree. It has a mode where it only removes from the index, leaving the work-tree file alone.

When you run git commit, Git packages up the index contents into a new snapshot—a new commit—which then becomes the current commit, so now the commit and the index match again. The work-tree is not changed during this process: if it used to match the index, it still does; if it didn't match the index, it still doesn't.

Hence, the index is a temporary area where you build the next commit, by staging files into it as desired. All the files you didn't stage explicitly, but were already in the index because of the previous commit, are still there. They go into the new snapshot.

How tracked / untracked works with git add and git status

What git status does—well, a big part of what it does, and all that we're concerned with right here—is to run two git diffs with, in effect, the --name-status option, and then display the results in a more useful form:

  • The first diff compares the current commit to the index. Whatever is the same, Git says nothing. For each file that is different, Git tells you staged for commit. So updated files in the index, which now differ from their HEAD version, are called staged. Git is silent about other files.

  • The second diff compares the index—the ready-to-be-committed, staged files—to the work-tree. Whatever is the same, Git says nothing. For each file that is different, Git tells you not staged for commit. So updated files in the work-tree, which now differ from their ready-to-commit staged copies, are called not staged.

But you can have files in your work-tree that aren't in your index right now. How did they get there? Well, Git filled in your index—your staging area—when you ran git checkout. If a file named foo.config wasn't there in the commit, it isn't there in the index. If it wasn't written to the index, it wasn't written to your work-tree either. But maybe something you ran created it in your work-tree. Maybe you even worked with it. So now it's there, in your work-tree, but not in your index.

What git status will do is complain about this file. It will say: untracked file. If you want git status to shut up about this file, you can list it in a .gitignore or a .git/info/exclude, and git status won't complain.

This has no effect on whether foo.config is in the index. We already said that it's not in the index, so it's still not in the index when git status doesn't complain. But that still leaves git add. If you run:

git add foo.config

you're telling Git: freeze-dry the work-tree copy and put that in the index. If you do that without any .gitignore or similar, Git will obey, and foo.config will now be in your index.

If it's not in the current commit, git status will tell you that it's newly added and ready to be committed (staged for commit), too. It's there, in the index, so it will be in the next commit.

If you don't want it to be in the next commit, you have to remove it from the index:

git rm foo.config

and now it's gone from the index and from your work-tree. If you didn't want it gone from your work-tree, use:

git rm --cached foo.config

and now it's gone from the index, but still in your work-tree, and now git status will complain about it as untracked. You might add it to a .gitignore or exclude file, so as to stop the complaining.

Adding it to the exclude file while it's untracked (i.e., not in the index, but in the work-tree) has one other beneficial side effect. Besides the kind of git add above, you can do:

git add .

or:

git add *

to add a whole bunch of files en-masse. If foo.config is currently untracked (i.e., not in the index but in the work-tree), and git add comes across an attempt to add it, git add won't add it by default. So it will remain untracked.

Note, however, that if you force the file into the index in any way—such as, for instance, by doing git checkout of a commit that does have foo.config in it—the file is now tracked, because it's in the index. A file that is in the index—however it got that way—is tracked. A file that isn't in the index—however it got that way—is untracked.

Putting it all together

... hide secrets like API keys or put large folders of photos that do not need to be in git hub ...

When you git push to GitHub, you send it commits. Whatever snapshots are in those commits, go to GitHub. If you want to be sure that secret.key and big.file don't go, you need to make sure they don't go into the index and hence do not go into any new commits.

To keep them from going into the index, they should start out that way—not in the index and not in any existing commits, that would get them into the index on a git checkout. If they're already in some such commits, you can either avoid those commits, or—but this is kind of hard—work hard and actually get rid of those commits—but if you don't, you're in good shape to start with.

To keep from accidentally putting the files into the index and hence into new commits, simply list the two file names in either or both of a .gitignore or a .git/info/exclude. If you put the names into a .gitignore, you can then git add .gitignore and make a new commit. From now on, that .gitignore is in that commit, and gets copied out of that commit into the index for every additional new commit you make from that commit. So the two names secret.key and big.file won't accidentally get into the index, and hence won't accidentally get into new commits.

You don't have to do either of these. All you have to do is make sure that no new commits make snapshots of those files. New commits make snapshots from the index, so you just have to make sure you don't accidentally copy secret.key and/or big.file into the index.

If for some reason you really want those in some commit(s), you can do that, but now you have to make sure that you never send those commits—the ones that have snapshots of secret.key and big.file—never go to GitHub. That's a fair bit harder. You're better off putting these files in a separate repository, if you want to version-control them at all.

The dot-files, .gitignore and .git/info/exclude, are really about:

  • keeping files that are currently untracked, untracked in the future too
  • without having to do that carefully / by hand
  • without git status whining about them

The .gitignore file itself can be put in the index, so that there are copies in each commit snapshot. Since new clones use the original commits, new clones have those. The .git/info/exclude file cannot be put in the index, so there are no copies of that in any snapshot, and new clones will not have it.

Upvotes: 2

wcarhart
wcarhart

Reputation: 2773

Files and folders included in your .gitignore file are NOT tracked in git, and thus are not pushed to GitHub.

If I wanted git to IGNORE a file, my .gitignore would look like so:

file_to_ignore.txt

You can ignore folders too:

folder_to_ignore/

If I wanted git to ignore everything EXCEPT for a file, my .gitignore would look like so:

*
!file_to_include.txt

* is a wildcard, which means everything. ! means do not ignore this file/folder.

Upvotes: 1

Related Questions