Abdulwahab Almestekawy
Abdulwahab Almestekawy

Reputation: 664

Are files specified in .gitignore pushed (uploaded) to remote repo by default?

In my local repo, I have large files that I don't wish to commit or upload to the remote repo. If I add these files to .gitignore, will they still be uploaded to the remote repo by git push command?

Upvotes: 0

Views: 1778

Answers (2)

torek
torek

Reputation: 489083

The git push command sends commits, not files. Each commit then contains every file (that it contains; see below). So to answer the question you asked—whether git push uploads any of the files listed in the .gitignore—you must inspect the commits to see if they contain those files.

The name .gitignore is inherently wrong: it does not make Git ignore files. The thing is, a correct name would be so unwieldy as to be unusable: it would be something like .git-do-not-auto-add-these-files-if-they-are-untracked-and-if-they-are-untracked-do-not-complain-about-them-being-untracked-when-I-run-git-status. It has one more effect, which is rare but important too: it gives Git permission to clobber those files in particular circumstances. So a fully-correct name would be even worse. But those two things, which are all about untracked files, are the two common ones.

So: what is all this stuff about untracked files in the first place? What makes a file "untracked"? To answer this question, we have to start with the reason Git exists at all, which is the commit.

Git is all about commits

Those new to Git often think it's about files. It isn't: it's about commits. Commits do contain files, but Git usually deals in whole commits at a time. Or, if they don't think Git is about files, beginners often think that Git is about branches, and again, it is not: it's still really about commits. Branches—or more precisely, branch names—do matter, because they help us (and Git) find the commits. But it's the commits that really matter. If you're going to use Git at all, you must learn what a commit is and does for you.

Each commit:

  • Is numbered. The "number" of a commit is big, ugly, and random-looking. It's actually a universally-unique ID expressed in hexadecimal, such as dcc0cd074f0c639a0df20461a301af6d45bd582e. That number, once assigned to that commit, means that commit, and never any other commit. This is how two different Git repositories decide whether they have the same commit: they either have the number, in which case they have that commit, or one does and one doesn't, and then the one that doesn't have it needs to get the commit from the one that does.

  • Is completely read-only. All Git internal objects are this way—it's necessary to make the numbering scheme work.

  • Stores a full archive of every file that Git "knew about" at the time you, or whoever, made the commit. The files in the commit are stored indirectly, in a special, read-only, Git-only, compressed and de-duplicated form, so that two different files with the same content, or two commits that have the same file with the same content, store the file only once. So, while every commit has a full archive of every file, all the commits also share all the files, so that the repository doesn't get enormously bloated.

  • Also stores some metadata, or information about the commit itself. This includes the name and email address of the person who made the commit. It includes some date-and-time-stamps. It includes any commit log message you'd like to put in. And, crucially for Git's own internal operation, each commit stores a list of previous commit hash IDs. This list is usually just one entry long: that means this commit comes right after its listed parent commit.

It's this list-of-parents, which strings commits together backwards, that forms the real branch information inside a repository. For instance, suppose we have a string of commits that ends at one with a hash we'll call H (for Hash). Commit H stores the hash ID of some earlier commit. We say that H points to this earlier commit:

            <-H

If we call the earlier commit G, and draw it in, we have:

        <-G <-H

Of course G also points to a still-earlier commit, which points backwards yet again, and so on:

... <-F <-G <-H

and that is a "branch". To find this branch, Git needs to know the actual hash ID of commit H, and we stick that in a branch name and say that the name points to H:

...--F--G--H   <-- main

If we have multiple branch names, each name just points to one commit:

          I--J   <-- feature-1
         /
...--G--H   <-- main
         \
          K--L   <-- feature-2

Now there are three "last" commits: H is the last commit on main as before, but J is the last commit on feature-1, and L is the last commit on feature-2. Note that commits up through H are on all three branches (this is peculiar to Git; most version control systems don't work this way).

If files in commits are read-only and Git-only, how can we work with them?

The fact is that each commit is read-only, and the files inside the commit can only be read by Git itself. So before we can use a commit, we have to have Git extract the commit. We do this with git checkout or git switch: we pick some commit, by hash ID like H or by name like main which provides Git with hash ID H, and say extract that commit:

git switch main

Git will read all the files from commit H and expand them into ordinary everyday files. Git puts these files in a work area, which Git calls our working tree or work-tree.

That, then, answers the question about how we get work done: we use the working tree copies of the files. These files are not in Git! They came out of Git—at least, the initial ones did—but they are not in Git at this point.

Git's index or staging area

After we work on some file in the working tree, we might want Git to use that file—plus all the other files that we didn't change—to make a new commit. In other version control systems, we would generally run their "make commit" verb:

hg commit

for instance. They would figure out which files we changed and make the new commit. Git, alas, does not make it this easy. Instead, Git demands that we run:

git add updated-file
git commit

The first command—the git add step—tells Git: Read the working tree file, compress it into your internal Gitty format with de-duplication, and insert that into your index / staging-area, ready to be committed.

The secret here—it's not really a secret, but it's not always advertised or taught correctly—is that Git already has every existing checked-out file in its index / staging-area. This thing—the thing Git calls either the index or the staging area, depending on which bit of Git documentation is doing this calling—holds in it your proposed next commit.

When you first switch to some particular commit, Git removes from your working tree all the checked-out files that came out of whatever commit you were using, as recorded in its index. It also removes all the checked-out files from its index. Then it installs, into its index and into your working tree, all the files that go with the new commit you want to use. That is, if we did:

git switch main

we had all the files from H, but if we then decided to look at feature-1 instead and ran:

git switch feature-1

Git removed all the files from H and replaced them with the files from J, which is the commit to which the name feature-1 points.

There are a couple of important things to know about this remove-and-replace step:

  • First, Git only removes-and-replaces any files it has to. Git is already de-duplicating files, so it knows which files in H are the same in J. For any file that's the same in both commits, it can skip the R(emove)-and-R(eplace) job. For files that are in H but not in J, it has to do the first R—the remove—without a replace, and for files that are in J but not H, it has to just replace.

  • Second, Git only does an R-and-R for the files it knows about. But what are those files? That's where the index comes in.

Because your working tree is a regular directory (or folder, if you prefer that term) with regular everyday files in it, you can create files that Git doesn't know about. These are your untracked files. That gives us the definition of an "untracked" file: an untracked file is one that is not in Git's index right now. This is extremely important, so let's repeat it.

An untracked file is a file that is not in Git's index right now

This is the key to making new commits, and also the key to .gitignore. Files that are in Git's index right now are tracked. Files that are not in Git's index right now are untracked. The "right now" part is important, because:

  • git switch and git checkout fill in the index; but
  • git add reads a working tree file and copies it into the index; and
  • git rm removes a file from both the working tree and the index.

This gives you several ways to change what's in the index / staging-area. (There are more ways, such as git restore, but we'll just cover these three for now.) Using git rm --cached, you can remove a file from Git's index without removing it from the working tree, too, so it's easy to take a file from tracked to untracked with git rm --cached, or from untracked to tracked with git add.

Git builds new commits from whatever is in the index right now

When you run:

git commit

Git makes a snapshot of all the files that are in the index right then, as of the form they have in the index at that time. So if you:

git switch main
<modify one file - say, README.txt>
git add file
<modify the file again>

you currently have three versions of README.txt: there's one unchangeable one in the current commit, a second one in Git's index, and a third one in your working tree. If you run git commit, it's that second one—the one in Git's index—that goes in the new commit.

(Note that git commit -a is roughly equivalent to git add -u && git commit. That is, the -a option merely does an update-add of all the files that are already in Git's index. So truly-new files require a separate git add step.)

This is where .gitignore comes in

You can explicitly run:

git add some-file

or, if you like, you can run an en-masse "add everything here" with:

git add .

Both operations tell Git to add a file; the second one tells Git to add all the files from the current directory. But some files should not be put in new commits, and it would be painful to force you to git add individual files instead of letting you use git add . to just make Git figure things out.

Besides this, the git status command will run two diffs:

  • The first diff compares the HEAD (current) commit to Git's index. For each file that is the same, Git says nothing at all. For each file that is different—including newly added or removed—Git tells you about it, saying staged for commit. That's because whatever is in the index will be in the next commit, and if it's different, that's interesting. If you have a big project, with thousands of files, you probably don't care that dull/1 through dull/999 are all unchanged; what you care about is that important.code is changed. This git status diff will tell you that.

  • The second diff compares the index to your working tree. For each file that is the same, Git says nothing. For files that are different, Git tells you about them, saying not staged for commit. But new files—files that aren't in Git's index, but are in your working tree—are separated out. Git tells you about them not as "new" but rather as untracked.

To prevent git status from being uselessly noisy about 10,000 untracked files that should be untracked, and to make git add . useful by not adding the 10,000 files, Git will read the poorly-named file .gitignore. These are the files that Git will (a) shut up about, and (b) not add even though you said git add ..

This has no effect on files that are already in Git's index. If the file is in Git's index, it is tracked and it will be in the next commit. Listing a file in .gitignore doesn't make it untracked and does not make Git ignore it: it just means that if it is untracked, Git won't complain about it and won't auto-add it.1


1In fact, if you explicitly try to add it with git add listed-ignored-file, Git will tell you that it's both currently untracked and listed in .gitignore, but—if you haven't disabled the hint—that you can override this with git add -f. Some people like to use this trick to ignore "everything" (with * in a .gitignore, for instance), and then force-add the few files that they don't want to ignore. I personally don't like to do this, but it does work. Just note that if you do use this trick, git rm --cached -r . && git add . will do bad things.


Diagnosing whether a file is "ignored"

Suppose you have some repository in which someone, at some point, said to ignore *.zorg files. Maybe that was the right thing to do at that time, even. But now you do need to commit some or all of the files in which you store your bad guys, so you don't want *.zorg ignored.

You run ls and you see:

jean-baptiste.emanuel.zorg

You run git status and it doesn't list jean-baptiste.emanuel.zorg.

Is the file unchanged, or is it changed-but-ignored, and not present in Git's index right now?

You can run:

git add jean-baptiste.emanuel.zorg

If it says nothing, the file was added; git status will now tell you whether it's changed. If it says that the file is explicitly ignored but you can use -f to override, you can git add -f jean-baptiste.emanuel.zorg to get it added.

You can also run:

git ls-files jean-baptiste.emanuel.zorg

If this says nothing, the file is ignored. Or you can run:

git check-ignore jean-baptiste.emanuel.zorg

(consider adding -v as well). If this says nothing, the file is not ignored, perhaps because it's been force-added.

Upvotes: 4

Kevin
Kevin

Reputation: 1130

If you add the correct path to the file in the .gitignore, the file won't be added if you use git add . and it won't be pushed if you type git push.

So short answer, no

Upvotes: 0

Related Questions