Reputation: 21

git unexpectedly pushes untracked files

I have a local git repository with .gitignore that is set to ignore some large files of the form *.lammpstrj. Running git ls-files produces the expected list of tracked files and running git ls-files -o shows that untracked files are being properly ignored. However, when pushing to the origin on github, git tries to push the large files to the remote repository causing the push to fail. Am I correctly viewing the list of untracked files or do I need to change something in git to prevent pushing ignored files? I've included the ls-files output and .gitignore file below.

List of cached files:

$ git ls-files
.gitignore
BONDS/BUILD
BONDS/blen
BONDS/bond_lengths.cpp
BONDS/distributions.cpp
BONDS/functions.h
BONDS/globals.h
BONDS/read_dump.cpp
BONDS/read_xyz.cpp
BONDS/unwrap.cpp
BONDS/write_traj.cpp
COMPILE
MSD/BUILD
MSD/COMPILE
MSD/calc_msd.cpp
MSD/functions.h
MSD/globals.h
MSD/msd
MSD/msd_time.cpp
MSD/read_dump.cpp
MSD/read_xyz.cpp
MSD/unwrap.cpp
README.md
RG/BUILD
RG/chain_stats.cpp
RG/distributions.cpp
RG/functions.h
RG/globals.h
RG/read_dump.cpp
RG/read_xyz.cpp
RG/rg
RG/rg_re_com.cpp
RG/unwrap.cpp
RG/write_traj.cpp
RUN
TESTS/COMPILE
TESTS/hello.c
TESTS/sizes
TESTS/sizes.c
TESTS/test
TESTS/test.c
TESTS/test.h
TRAJ/COMPILE
TRAJ/read_dump.cpp
TRAJ/read_xyz.cpp
TRAJ/unwrap.cpp
blen
msd
rg

List of other (untracked) files:

$ git ls-files . -o
BONDS/bond_lengths.o
BONDS/read_dump.o
BONDS/unwrap.o
BONDS/write_traj.o
MSD/calc_msd.o
MSD/msd_time.o
MSD/read_xyz.o
RG/chain_stats.o
RG/distributions.o
RG/read_dump.o
RG/rg_re_com.o
RG/unwrap.o
RG/write_traj.o
SAMPLE/generate.py
SAMPLE/hists.out
SAMPLE/lengths.out
SAMPLE/mol_traj.lammpstrj
SAMPLE/plot.p
SAMPLE/stats.out
SAMPLE/unwrapped_traj.lammpstrj
SAMPLE/wrapped_traj.lammpstrj

My .gitignore file

*.[oa]
*.lammpstrj
SAMPLE/

Upvotes: 1

Answers (1)

torek

Reputation: 487775

Git transfers commits

First, remember that Git transfers (fetches and pushes) commits, rather than individual files. Sometimes files get dragged along by these commits. But what exactly does that mean, especially here? Let's take a look.

A commit is simply:

a complete snapshot of a work-tree (or more precisely, of what was in a staging area or "index")
with an author and a committer (name, email address, and timestamp)
plus a commit message
and some set of parent commits.

It's that first part—the "snapshot of a tree of files"—that makes it seem like Git pushes files. And in fact, the way Git works underneath, when Git fetches or pushes a particular commit, it must also transfer all the files that go with that—unless they're already there.

Remember that each commit is a complete snapshot. This means that if you have a chain of commits, ending in a branch tip commit:

...--A--B--C   <-- branch-tip

and commit B has only changed—or even more likely, added—one file vs A, and C has only deleted that one file as compared to B, then all those other files in B and C are the same as they are in A. In fact, if you added a file to B, and then deleted it again in C, the entire set of files in C match the set in A. Let's assume, for the rest of this, that this is what you did: add something and commit it to make B, then remove it again to make C. (More likely, you added and/or modified several or many things, maybe more than once, and somewhere along the way you added one of the "forbidden" files, then later deleted it, in a longer chain of commits than just B--C.)

It's very common for commits to share a whole lot of files, in this manner. (Also, git push and git fetch generally compress stuff when pushing and fetching, through clever use of what Git calls a "thin pack", but none of that matters in terms of finding and fixing the problem here.)

`git fetch` and `git push` involve two Git repositories

Each Git repository is its own complete, stand-alone entity. Repositories are, at least in principle, peers: your repository is not superior to the other one, nor inferior to it. All we can really say for sure is that your repository is yours, and theirs is theirs. You and they (whoever "they" are) are free to impose some sort of boss/employee or whatever style relationship upon this, but that's between you and them, not something Git itself cares about.

Whether you're pushing or fetching, your Git talks to their Git to find out which commits you have that they don't that you want (git push), or that they have that you don't that you want (git fetch). Those commits may or may not have associated files—which Git calls "blobs"—and you and/or they may already have those files, whether or not you and they have those commits.

For instance, suppose you go to push commits B and C to some other Git repository that already has commit A. Your Git will hand their Git both commits—and all the files that they don't have, which is just the one added file in B.

If you, in your own repository, ask your Git to show you what is in your current commit, and that's the branch tip commit C, you will see the same set of files that they have in their commit A, which is exactly the same as your commit A. (This is in fact where those big ugly SHA-1 hashes come from: they uniquely identify the exact Git object, so that your Git and their Git can both tell that you both have commit A. Note that even though A and C have the same tree, they're different commits. They have different time-stamps; but even if they didn't, they also have different parents: the parent of A is a commit we can't see, off to the left, while the parent of C is B.)

What you don't see, just by looking at C, is that commit B requires one extra file. If that file violates some rule that the other Git is enforcing, then your git push will get rejected, because they will inspect both commit B and commit C, and find the file in question.

What to do about it

In general, for items like this, the answer is to "rewrite" your own commit history—to replace commits B and C, for instance, with just one new commit, or even to remove B and Centirely.

The trickiest part is usually figuring out "what they have" vs "what you have", which determines which commits your Git will send. Typically, though, if they're your peer—or even your origin—you can use git fetch to get an up-to-date set of copies of their commits, i.e., "what they have", and then use git log, optionally with --graph --oneline --decorate, to see what you have that they don't. For instance, if you are keeping track of their repository under your name origin, you can just do:

git fetch origin

to get your Git to bring over their commits, and then:

git log --graph --oneline --decorate @{upstream}..HEAD

to see what your Git has that theirs does not. (The --decorate is really only useful to determine if they're sharing some of your commits on some other branch(es); if so, rebasing, which copies commits, may result in more headaches, and you'll need to be extra careful.)

Typically this will just show you your own commits—with, if you're lucky and/or careful, no merges—and you can then use git rebase to elide or fix up commits that contain unwanted files. Use:

git log --oneline --name-status @{u}..

to view the same commits (the --graph and other options can also be used here if you like) with git diff --name-status added after each one-line log message for each commit, so that you can find out which commit(s) modify or add files that you don't want to be sending after all.