Reputation: 5488
What am I trying to do:
Replace a single local file (app.py) with the previously committed remote file.
What have I tried:
I can git add a file as follows:
git add app.py
git commit -m "added x "
git push -u sec_aggregator
But when I want to replace the local app.py on my pc with the remote one I tried this:
git fetch
git checkout sec_aggregator/feed/app.py
and got this error:
error: pathspec 'sec_aggregator/feed/app.py' did not match any file(s) known to git
How do you get the correct path please?
Update:
When I do the command git branch
I get this output:
list
* master
Upvotes: 1
Views: 619
Reputation:
To replace a file with the previously committed version you can do:
git checkout add.py
or
git reset --hard add.py
Please make sure that you don't have unsaved changes for the add.py
in the working directory before running any of them. Those operations are not working-directory safe, so your local changes for that file will be lost.
Upvotes: 0
Reputation: 488183
Let me start with a question. You ran:
git add app.py
and apparently this worked (you show no error and the subsequent git commit
seems to have succeeded as well). So, should you now run:
git checkout sec_aggregator/feed/app.py
? Or would it make more sense to run:
git checkout app.py
? That is, why do you expect to use the full file name sec_aggregator/feed/app.py
in git checkout
, and the partial (relative to current directory) file name app.py
in git add
? Have you navigated to a different directory / folder within your work-tree?
What you probably want here is:
git checkout <some-commit-specifier> -- app.py
since git checkout
will use your current folder/directory within your work-tree to resolve the file's full name, the same way that git add
does. The some-commit-specifier
part here may be as simple as HEAD~1
.
This might be all you need to make progress; if so, feel free to ignore the rest of this answer. :-) But this is not the only unusual thing in your question. As Julian noted, you seem to be using the string sec_aggregator
as the name of a remote. This is a pretty unusual remote-name; most people have a repository with only one remote, named origin
, and some have a repository in which they have added a second remote, typically called upstream
. It is possible to use pretty much any alphanumeric name as a remote, so sec_aggregator
is OK here, but I suspect it's not what you mean to use. (What was the exact output of this git push
anyway?)
So, let's take a look at what a Git repository is, and how different Git repositories talk to each other, because this all relates to your ultimate goal, which I'll repeat here:
[I want to r]eplace a single local file (app.py) with the previously committed remote file.
Although there is a lot of extra machinery to make this all work, at its heart, a Git repository is a collection of commits.1 That is, Git doesn't store files but rather commits. This collection is a sort of key-value database, with the keys being commit hash IDs (which we will get to in a moment). Each commit itself stores files—a full, complete snapshot of each file as of its state at the time you (or whoever) made the commit—so if you go down one level, into the commits, you do get files; but the storage-unit, as it were, is the commit.
The files stored in a commit are frozen for all time. They can never be changed: not by you, nor by Git. As you can imagine, re-freezing every file for every commit could use up a lot of storage space. So Git doesn't actually do that. Instead, each frozen file is in a special, read-only, Git-only, compressed form. I like to call this the freeze-dried format, although that's not a formal Git term. That means they take less space, and sometimes a Git repository database is smaller than the files it stores. But there's an even more useful trick here. If the previous version of a file is frozen, and we need to make a new commit that has the same version, why not just re-use the freeze-dried copy of the file? And that's exactly what Git does. Commits keep re-using existing freeze-dried file copies, so that the growth of the .git
directory holding the internal database is controlled.
So, each commit stores files—i.e., data—but that's not all. A commit also stores some metadata, some information about the commit itself. For instance, each commit has an author and a committer, usually the same person. When you make a commit, you become the author-and-committer. Each commit has a date-and-time-stamp too, or perhaps I should say two, as there's one for the author line and one for the committer line. You also get to supply a log message giving a reason for the commit. But there is one more key item in every commit, which Git calls the parent, or for merge commits, parents, plural.
Now, every commit acquires—at the point it comes into existence—a new and unique hash ID. The hash ID is a big ugly string of letters and numbers that, technically, is the hexadecimal representation of a cryptographic checksum of the contents of that commit. One reason we can't know what the hash ID of a new commit will be until we've made it is that date-and-time-stamp part: if you make a commit, then remove it,2 then make it again, the second one will have a different time in it and will actually be a different commit, with a different hash ID, even if it stores all the same file contents.
The hash ID is, in effect, the true name of a commit. Crucially, every Git in the universe will agree that that hash ID—whatever it is—is the one and only correct hash ID for that particular commit: the one with that snapshot of those files, made by you, with your log message, on the date-and-time at which you made it. So Git can use this hash ID as a unique key, in a key-value database, to store and retrieve this commit, and—importantly—your Git can call up another Git and ask it: Do you have this hash ID? If they—the other Git—do have that hash ID, they have your commit. If not, they don't.
In addition to all of this, your Git stores the hash ID of the previous commit—the one that goes before this commit—in the metadata for this commit. So, given a commit, Git can look at its parent and get the hash ID of the previous commit. Your Git can then retrieve that commit from its database of commits. That commit has a snapshot of all of the files as of the time you (or whoever) made that commit, and it, too, has a parent hash ID, which locates yet another earlier commit, and so on.
This means that Git can start with the most recent or last commit, at the end of a long backwards-looking chain, and work back to the very first commit you (or whoever) made. This very-first commit does not have any parent hash ID, simply because it can't: there was no earlier commit to connect back to. Except for cases where your string of commits branches or merges—which I won't cover properly here—this means your Git repository structure is really simple:
... <-F <-G <-H
Here, the uppercase letters stand in for the actual hash IDs of each commit. We draw them as pointing to their parent commits. Now Git just needs some way to remember the actual hash ID of the last commit, H
, and that's where a branch name comes in:
...--F--G--H <-- master
The branch name master
holds the raw hash ID of commit H
, so that H
is the last commit on master. To make a new commit, you modify some file, git add
it, and run git commit
. The git commit
gathers your log message—in this case, from your -m
argument—and uses your name and email address and the current date-and-time to set up most of the metadata. It uses hash ID of the current commit H
—which is stored in the name master
—as the parent for the new commit. It freezes all the files into the new commit, which we'll call I
. Let's draw it in:
...--F--G--H <-- master
\
I
Now that commit I
exists, and has frozen-for-all-time copies of all of the files—a new snapshot—Git simply changes the hash ID stored in the name master
, so that master
now points to commit I
instead of commit H
:
...--F--G--H
\
I <-- master
Like each stored file's contents, the commit's contents can never be changed.3 All commits are permanent (mostly—see footnote 2) and read-only (totally).
1This extra machinery is deliberately exposed, in Git, so it's possible to (ab)use Git to store files directly, e.g., through the use of tags. But that's not how it's designed to work.
2It's a little bit tricky to remove a commit, but it is possible. Essentially, you have to make the commit un-find-able first. This gets into the notion of reachability in a graph, which again we won't go into here, but see Think Like (a) Git for more about this.
3The reason for this is that the hash IDs are cryptographic checksums of the contents. Make any change, and what you have is a new and different internal object, with a new and different checksum. The old object is still there in the database. You didn't change a file, or a commit: you just made a new file, or a new commit.
master
simply identifies the last commit in the chain. From here, Git can work backwards to earlier commits.master
, and working backwards.master
. But you can run git log
or use other tricks to find the hash IDs, which may matter in a moment.Commits, and their stored files, are frozen for all time. That's great for archival, but useless for getting any new work done. You need to be able to take all the files out of some commit, un-freezing and re-hydrating them. Git therefore sets up a work area, which Git calls your work-tree or working tree or other variations along this line. This folder, plus any sub-folders Git needs to create, holds the files extracted from the commit. The files have full names, e.g., feed/app.py
, such that your computer requires making the sub-folder feed
. These folders aren't stored in Git;4 the files just have full names that force Git to create folders to hold them.
In any case, checking out some branch-and-commit, as in git checkout master
, tells Git: extract all the files from that commit, into my work-tree, so that I can see them and work with them. The branch is the argument you gave to git checkout
, and the commit is based on the hash ID stored in the branch name. In our drawings above, that was first commit H
, then commit I
after we made the new one.
As we saw above, to make a new commit, you can just work with the work-tree file—it's an ordinary file and you can do anything to it that your computer will let you do to it—and then run git add
on it. But why do you have to git add
the file every time you change it?
This is where Git's index comes in. The index or staging area (or sometimes, rarely these days, the cache) holds a copy of each file that git checkout
checked out. This copy is in the freeze-dried format, ready to go into the next commit.5 Initially, that's just the actual copy from the previous commit. Running git add
has Git compress / freeze-dry the updated contents and put those into the index.
What this means is that, at all times, the index holds copies of the files you propose to put into the next snapshot.6 Initially, the index matches the current commit. You then change work-tree files, but the index still matches the current commit: nothing new is staged for commit
yet. Then you git add
the file to replace the index copy with a freeze-dried version of the work-tree copy. Now something is staged for commit
.
In other words, at all times, you have three active copies of each file. Let's consider the file README.md
for a nice concrete example. Note that the syntax with the colon in it is special to Git, i.e., HEAD:README.md
and :README.md
won't work with most commands on your computer. But git show
, and some other Git commands, use this commit:path
syntax. (Annoyingly, the colon means something different in git fetch
and git push
.)
HEAD:README.md
is the frozen copy in the current commit. (Use git show HEAD:README.md
to view it.) You cannot change this one, but it's easy to access.:README.md
is the frozen-format copy in the staging area, i.e., the index. (Use git show :README.md
to view it.) You can change this one, by replacing it. Use git add
to replace it from the third copy, which is:README.md
is an ordinary file in your work-tree. Use ordinary commands to view it or change it or whatever.In fact, all frozen copies in every commit are accessible at all times, but the one (or ones) in the current commit has (have) a special role. This is in part because HEAD
plays a big part in what git status
says.
The git status
command will tell you that some files are staged for commit and other files are not staged for commit. To do that, git status
runs two separate comparisons. The first one is HEAD
-vs-index. The second one is index-vs-work-tree:
First, for each file in HEAD
, compare it to the one in the index. Do they match? If so, say nothing. If not, say that this file is staged for commit.
Next, for each file in the index, compare it to the one in the work-tree. Do they match? If so, say nothing. If not, say that this file is not staged for commit.
There are some extra cases, e.g., files in your work-tree that aren't in your commit, or files that have been removed from your index and/or work-tree. But the above is the heart of what git status
tells you about the next commit you would make if you ran git commit
right now—the staged for commit files—or the commit you could make if you ran more git add
commands.
Crucially, your work-tree, and the index that sits between your work-tree and the Git repository, is specific to this particular Git repository. When you have your Git call up some other Git, they will exchange commits. Your index and your work-tree are private: they cannot see yours. By the same token, their index, and their work-tree if they have one,7 are also private. You cannot see theirs.
4They sort of are, and sort of aren't. The end result is mostly "aren't" as you cannot store an empty directory in Git.
5Technically, the blob object is actually already in the Git database. The index just refers to that object by its hash ID. When you git add
an updated file, this will create a new blob object if needed, and the index will now refer to the new blob object. If the contents you git add
match those of some stored version, anywhere in the repository, Git will re-use the existing blob object.
6The index takes on an expanded role during a conflicted merge. It also has some other uses. Saying the index represents your next commit is not wrong, but not really complete—but this answer is already long enough.
7If you use GitHub or Bitbucket or any other similar web hosting service, they have bare repositories that have no work-tree. In order to git push
to some server repository, that server repository is typically also created with --bare
. This sidesteps a bunch of problems that could occur if you want to update a branch that they have checked-out. Without a work-tree, they cannot check out any branch.
The special name HEAD
refers to both your current branch and your current commit. (Git has two ways to ask about HEAD
; one produces the branch name, the other produces the commit hash ID.) While the commits themselves cannot be changed, you can select any branch name for HEAD
and therefore change which commit is the current commit. Adding a new commit automatically updates the branch name, so that the new commit you just made is the current commit now.
The index or staging area holds every file that will go into your next commit.
The work-tree is where you can see and work with your files. Git doesn't really need this at all: Git's main concern is with the repository itself, and then with the index since that's the source of new commits. But Git has to provide a work-tree, so that you can actually use Git.
The work-tree and index are private to this Git repository.
Note that as humans, we tend to be concerned with our work-tree files. But Git isn't, not very much anyway. This all comes back to the key problem with your desire:
... I want to replace the local app.py on my pc with the remote one ...
There isn't a remote one, as far as Git itself is concerned. Your Git is going to call up another Git. That other Git has commits. Your Git either already has all of their commits, or will get any new ones from them if needed. That's what git fetch
does: your Git calls up their Git, asks their Git about their branches and tags and so on, and collects from them any new commits they have that you don't. Once it has all their commits added to your database, git fetch
is done and the two Gits stop talking to each other.
You may well already have all of their commits. If so, there's no need to git fetch
here at all. That's probably the case above: you started with all of their commits, then you made one new commit and gave them that one via git push
. You still have all of their commits (and they now have all of yours too).
In any case, now that you have all of their commits, you can select any of those commits—which are all in your repository now—and either check out that entire commit, or selectively extract individual files from individual commits. This is where git checkout
gets quite messy. Git 2.23 adds new commands that you can use instead of git checkout
, to help keep this stuff straighter, but we'll just talk about git checkout
and git show
here.
Using git checkout
, you can tell your Git:
Using git show
, you can tell your Git:
Note that this git checkout
will overwrite your current index copy. (You can fix or change this later if you want, using, e.g., git reset
.) The git show
won't do this.
To achieve the first one, you must pick a commit. The easiest way is often to use git log
to find the hash ID, then cut-and-paste that ID with your mouse:
git checkout <hash-id> -- path/to/file
To achieve the second one, pick a commit again, and run:
git show <hash-id>:path/to/file
There's a bit of weirdness here. With git checkout
, the path to the file is relative to where you are in the work-tree, just as it is with git add
. If you're in the sub-folder feed
and the file is named feed/app.py
, you just use app.py
here. But with git show
, you must use the full name of the file, or write git show hash-id:./app.py
. (Internally, this has to do with the fact that git add
and git checkout
take arguments that Git calls pathspecs, while git show
doesn't. But in practice it's just Git being messy and a bit hard to use.)
As shown above, you can name a commit by its hash ID. You can use git log
, perhaps with --all --decorate --online --graph
—one of my favorite Git commands—to find lots of hash IDs, which you can then cut and paste. You can use a branch name like master
to mean the commit that the name points to. For instance, if your graph looks kind of like this (though git log --graph
will draw it vertically instead of horizontally):
...--F--G--H <-- master
\
K--L <-- test
you can use the name test
to mean commit L
, just as the name master
means commit H
. And, when you have a remote like origin
and run git fetch
, your Git sets up all your remote-tracking names, like origin/master
, to remember what your Git got from their Git when your Git called them up and said: Hey, what commits do you have, under what branch names?
So, if you had:
...--F--G--H <-- master, origin/master
and then you added a new commit I
to the end of your master, you now have:
...--F--G--H <-- origin/master
\
I <-- master
So now instead of finding the actual hash ID for commit H
, you can use the name origin/master
to identify it. Hence:
git show origin/master:feed/app.py
will let you see the version that is still in commit H
, which is still identified by your origin/master
.
Your origin/master
will be automatically updated on each git fetch
to origin
: your Git calls up their Git, asks it what branches they have, and they say "I have master
and it's commit big-ugly-hash-ID". Your Git gets that commit if you don't have it yet—along with all the earlier commits you need too—and now you have it. Then your Git sets your origin/master
to remember that big ugly hash ID.
So, this gets us to the final, and maybe best, way to view or extract that file from their commit: run git fetch
if needed, then use git show
or git checkout
with the name origin/master
—assuming you call that other Git origin
, as most people do—and the appropriate path name. For git checkout
, the appropriate path name depends where you are in your work-tree. For git show
, the appropriate path name is the full path or starts with ./
. The syntax for git checkout
is:
git checkout <commit> -- <path>
and the syntax for git show
is:
git show <commit>:<path>
and you just have to remember that they're different like this.
Upvotes: 1
Reputation: 36720
I think your are looking for
git checkout sec_aggregator/feed app.py
This will checkout app.py
from branch feed
defined on remote
sec_aggregator
Upvotes: 0