Reputation: 5488

Git checkout pathspec error. Command to get the correct path

What am I trying to do:

Replace a single local file (app.py) with the previously committed remote file.

What have I tried:

I can git add a file as follows:

git add app.py
git commit -m "added x "
git push -u sec_aggregator

But when I want to replace the local app.py on my pc with the remote one I tried this:

git fetch
git checkout sec_aggregator/feed/app.py

and got this error:

error: pathspec 'sec_aggregator/feed/app.py' did not match any file(s) known to git

How do you get the correct path please?

Update:

When I do the command git branch

I get this output:

  list
* master

Upvotes: 1

Answers (3)

user11547066

Reputation:

To replace a file with the previously committed version you can do:

git checkout add.py

git reset --hard add.py

Please make sure that you don't have unsaved changes for the add.py in the working directory before running any of them. Those operations are not working-directory safe, so your local changes for that file will be lost.

Upvotes: 0

torek

Reputation: 488183

Let me start with a question. You ran:

git add app.py

and apparently this worked (you show no error and the subsequent git commit seems to have succeeded as well). So, should you now run:

git checkout sec_aggregator/feed/app.py

? Or would it make more sense to run:

git checkout app.py

? That is, why do you expect to use the full file name sec_aggregator/feed/app.py in git checkout, and the partial (relative to current directory) file name app.py in git add? Have you navigated to a different directory / folder within your work-tree?

What you probably want here is:

git checkout <some-commit-specifier> -- app.py

since git checkout will use your current folder/directory within your work-tree to resolve the file's full name, the same way that git add does. The some-commit-specifier part here may be as simple as HEAD~1.

This might be all you need to make progress; if so, feel free to ignore the rest of this answer. :-) But this is not the only unusual thing in your question. As Julian noted, you seem to be using the string sec_aggregator as the name of a remote. This is a pretty unusual remote-name; most people have a repository with only one remote, named origin, and some have a repository in which they have added a second remote, typically called upstream. It is possible to use pretty much any alphanumeric name as a remote, so sec_aggregator is OK here, but I suspect it's not what you mean to use. (What was the exact output of this git push anyway?)

Long: about Git repositories, and working with them

So, let's take a look at what a Git repository is, and how different Git repositories talk to each other, because this all relates to your ultimate goal, which I'll repeat here:

[I want to r]eplace a single local file (app.py) with the previously committed remote file.

Although there is a lot of extra machinery to make this all work, at its heart, a Git repository is a collection of commits.¹ That is, Git doesn't store files but rather commits. This collection is a sort of key-value database, with the keys being commit hash IDs (which we will get to in a moment). Each commit itself stores files—a full, complete snapshot of each file as of its state at the time you (or whoever) made the commit—so if you go down one level, into the commits, you do get files; but the storage-unit, as it were, is the commit.

The files stored in a commit are frozen for all time. They can never be changed: not by you, nor by Git. As you can imagine, re-freezing every file for every commit could use up a lot of storage space. So Git doesn't actually do that. Instead, each frozen file is in a special, read-only, Git-only, compressed form. I like to call this the freeze-dried format, although that's not a formal Git term. That means they take less space, and sometimes a Git repository database is smaller than the files it stores. But there's an even more useful trick here. If the previous version of a file is frozen, and we need to make a new commit that has the same version, why not just re-use the freeze-dried copy of the file? And that's exactly what Git does. Commits keep re-using existing freeze-dried file copies, so that the growth of the .git directory holding the internal database is controlled.

So, each commit stores files—i.e., data—but that's not all. A commit also stores some metadata, some information about the commit itself. For instance, each commit has an author and a committer, usually the same person. When you make a commit, you become the author-and-committer. Each commit has a date-and-time-stamp too, or perhaps I should say two, as there's one for the author line and one for the committer line. You also get to supply a log message giving a reason for the commit. But there is one more key item in every commit, which Git calls the parent, or for merge commits, parents, plural.

Now, every commit acquires—at the point it comes into existence—a new and unique hash ID. The hash ID is a big ugly string of letters and numbers that, technically, is the hexadecimal representation of a cryptographic checksum of the contents of that commit. One reason we can't know what the hash ID of a new commit will be until we've made it is that date-and-time-stamp part: if you make a commit, then remove it,² then make it again, the second one will have a different time in it and will actually be a different commit, with a different hash ID, even if it stores all the same file contents.

The hash ID is, in effect, the true name of a commit. Crucially, every Git in the universe will agree that that hash ID—whatever it is—is the one and only correct hash ID for that particular commit: the one with that snapshot of those files, made by you, with your log message, on the date-and-time at which you made it. So Git can use this hash ID as a unique key, in a key-value database, to store and retrieve this commit, and—importantly—your Git can call up another Git and ask it: Do you have this hash ID? If they—the other Git—do have that hash ID, they have your commit. If not, they don't.

In addition to all of this, your Git stores the hash ID of the previous commit—the one that goes before this commit—in the metadata for this commit. So, given a commit, Git can look at its parent and get the hash ID of the previous commit. Your Git can then retrieve that commit from its database of commits. That commit has a snapshot of all of the files as of the time you (or whoever) made that commit, and it, too, has a parent hash ID, which locates yet another earlier commit, and so on.

This means that Git can start with the most recent or last commit, at the end of a long backwards-looking chain, and work back to the very first commit you (or whoever) made. This very-first commit does not have any parent hash ID, simply because it can't: there was no earlier commit to connect back to. Except for cases where your string of commits branches or merges—which I won't cover properly here—this means your Git repository structure is really simple:

... <-F <-G <-H

Here, the uppercase letters stand in for the actual hash IDs of each commit. We draw them as pointing to their parent commits. Now Git just needs some way to remember the actual hash ID of the last commit, H, and that's where a branch name comes in:

...--F--G--H   <-- master

The branch name master holds the raw hash ID of commit H, so that H is the last commit on master. To make a new commit, you modify some file, git add it, and run git commit. The git commit gathers your log message—in this case, from your -m argument—and uses your name and email address and the current date-and-time to set up most of the metadata. It uses hash ID of the current commit H—which is stored in the name master—as the parent for the new commit. It freezes all the files into the new commit, which we'll call I. Let's draw it in:

...--F--G--H   <-- master
            \
             I

Now that commit I exists, and has frozen-for-all-time copies of all of the files—a new snapshot—Git simply changes the hash ID stored in the name master, so that master now points to commit I instead of commit H:

...--F--G--H
            \
             I   <-- master

Like each stored file's contents, the commit's contents can never be changed.³ All commits are permanent (mostly—see footnote 2) and read-only (totally).

¹This extra machinery is deliberately exposed, in Git, so it's possible to (ab)use Git to store files directly, e.g., through the use of tags. But that's not how it's designed to work.

²It's a little bit tricky to remove a commit, but it is possible. Essentially, you have to make the commit un-find-able first. This gets into the notion of reachability in a graph, which again we won't go into here, but see Think Like (a) Git for more about this.

³The reason for this is that the hash IDs are cryptographic checksums of the contents. Make any change, and what you have is a new and different internal object, with a new and different checksum. The old object is still there in the database. You didn't change a file, or a commit: you just made a new file, or a new commit.

Quick recap of the above

Every commit has a unique hash ID.
Each commit stores a frozen copy of all files forever (or as long as the commit itself continues to exist).
Each commit has some set of parent hash IDs, usually just the one immediate parent. (The very first commit has no parent. A merge commit starts with the usual parent, but then has more; we won't get into the details here.)
Nothing in any commit can ever be changed.
A branch name like master simply identifies the last commit in the chain. From here, Git can work backwards to earlier commits.
So Git finds commits by starting with a name, like master, and working backwards.
One Git repository can talk to another. When it does so, it talks about what it has by hash ID. The hash IDs are the universal exchange method.
You, as a human, will generally use names like master. But you can run git log or use other tricks to find the hash IDs, which may matter in a moment.

A brief-ish look at the index and work-tree

Commits, and their stored files, are frozen for all time. That's great for archival, but useless for getting any new work done. You need to be able to take all the files out of some commit, un-freezing and re-hydrating them. Git therefore sets up a work area, which Git calls your work-tree or working tree or other variations along this line. This folder, plus any sub-folders Git needs to create, holds the files extracted from the commit. The files have full names, e.g., feed/app.py, such that your computer requires making the sub-folder feed. These folders aren't stored in Git;⁴ the files just have full names that force Git to create folders to hold them.

In any case, checking out some branch-and-commit, as in git checkout master, tells Git: extract all the files from that commit, into my work-tree, so that I can see them and work with them. The branch is the argument you gave to git checkout, and the commit is based on the hash ID stored in the branch name. In our drawings above, that was first commit H, then commit I after we made the new one.

As we saw above, to make a new commit, you can just work with the work-tree file—it's an ordinary file and you can do anything to it that your computer will let you do to it—and then run git add on it. But why do you have to git add the file every time you change it?

This is where Git's index comes in. The index or staging area (or sometimes, rarely these days, the cache) holds a copy of each file that git checkout checked out. This copy is in the freeze-dried format, ready to go into the next commit.⁵ Initially, that's just the actual copy from the previous commit. Running git add has Git compress / freeze-dry the updated contents and put those into the index.

What this means is that, at all times, the index holds copies of the files you propose to put into the next snapshot.⁶ Initially, the index matches the current commit. You then change work-tree files, but the index still matches the current commit: nothing new is staged for commit yet. Then you git add the file to replace the index copy with a freeze-dried version of the work-tree copy. Now something is staged for commit.

In other words, at all times, you have three active copies of each file. Let's consider the file README.md for a nice concrete example. Note that the syntax with the colon in it is special to Git, i.e., HEAD:README.md and :README.md won't work with most commands on your computer. But git show, and some other Git commands, use this commit:path syntax. (Annoyingly, the colon means something different in git fetch and git push.)

HEAD:README.md is the frozen copy in the current commit. (Use git show HEAD:README.md to view it.) You cannot change this one, but it's easy to access.
:README.md is the frozen-format copy in the staging area, i.e., the index. (Use git show :README.md to view it.) You can change this one, by replacing it. Use git add to replace it from the third copy, which is:
README.md is an ordinary file in your work-tree. Use ordinary commands to view it or change it or whatever.

In fact, all frozen copies in every commit are accessible at all times, but the one (or ones) in the current commit has (have) a special role. This is in part because HEAD plays a big part in what git status says.

The git status command will tell you that some files are staged for commit and other files are not staged for commit. To do that, git status runs two separate comparisons. The first one is HEAD-vs-index. The second one is index-vs-work-tree:

First, for each file in HEAD, compare it to the one in the index. Do they match? If so, say nothing. If not, say that this file is staged for commit.
Next, for each file in the index, compare it to the one in the work-tree. Do they match? If so, say nothing. If not, say that this file is not staged for commit.

There are some extra cases, e.g., files in your work-tree that aren't in your commit, or files that have been removed from your index and/or work-tree. But the above is the heart of what git status tells you about the next commit you would make if you ran git commit right now—the staged for commit files—or the commit you could make if you ran more git add commands.

Crucially, your work-tree, and the index that sits between your work-tree and the Git repository, is specific to this particular Git repository. When you have your Git call up some other Git, they will exchange commits. Your index and your work-tree are private: they cannot see yours. By the same token, their index, and their work-tree if they have one,⁷ are also private. You cannot see theirs.

⁴They sort of are, and sort of aren't. The end result is mostly "aren't" as you cannot store an empty directory in Git.

⁵Technically, the blob object is actually already in the Git database. The index just refers to that object by its hash ID. When you git add an updated file, this will create a new blob object if needed, and the index will now refer to the new blob object. If the contents you git add match those of some stored version, anywhere in the repository, Git will re-use the existing blob object.

⁶The index takes on an expanded role during a conflicted merge. It also has some other uses. Saying the index represents your next commit is not wrong, but not really complete—but this answer is already long enough.

⁷If you use GitHub or Bitbucket or any other similar web hosting service, they have bare repositories that have no work-tree. In order to git push to some server repository, that server repository is typically also created with --bare. This sidesteps a bunch of problems that could occur if you want to update a branch that they have checked-out. Without a work-tree, they cannot check out any branch.

Another quick recap

The special name HEAD refers to both your current branch and your current commit. (Git has two ways to ask about HEAD; one produces the branch name, the other produces the commit hash ID.) While the commits themselves cannot be changed, you can select any branch name for HEAD and therefore change which commit is the current commit. Adding a new commit automatically updates the branch name, so that the new commit you just made is the current commit now.
The index or staging area holds every file that will go into your next commit.
The work-tree is where you can see and work with your files. Git doesn't really need this at all: Git's main concern is with the repository itself, and then with the index since that's the source of new commits. But Git has to provide a work-tree, so that you can actually use Git.
The work-tree and index are private to this Git repository.

Reframing your original problem

Note that as humans, we tend to be concerned with our work-tree files. But Git isn't, not very much anyway. This all comes back to the key problem with your desire:

... I want to replace the local app.py on my pc with the remote one ...

There isn't a remote one, as far as Git itself is concerned. Your Git is going to call up another Git. That other Git has commits. Your Git either already has all of their commits, or will get any new ones from them if needed. That's what git fetch does: your Git calls up their Git, asks their Git about their branches and tags and so on, and collects from them any new commits they have that you don't. Once it has all their commits added to your database, git fetch is done and the two Gits stop talking to each other.

You may well already have all of their commits. If so, there's no need to git fetch here at all. That's probably the case above: you started with all of their commits, then you made one new commit and gave them that one via git push. You still have all of their commits (and they now have all of yours too).

In any case, now that you have all of their commits, you can select any of those commits—which are all in your repository now—and either check out that entire commit, or selectively extract individual files from individual commits. This is where git checkout gets quite messy. Git 2.23 adds new commands that you can use instead of git checkout, to help keep this stuff straighter, but we'll just talk about git checkout and git show here.

Using git checkout, you can tell your Git:

Extract one specific file from one specific commit. Copy this file into the index, then copy it from the index to the work-tree.

Using git show, you can tell your Git:

Find one specific file in one specific commit. Show this file to the standard output: the terminal window, or wherever I have redirected it.

Note that this git checkout will overwrite your current index copy. (You can fix or change this later if you want, using, e.g., git reset.) The git show won't do this.

To achieve the first one, you must pick a commit. The easiest way is often to use git log to find the hash ID, then cut-and-paste that ID with your mouse:

git checkout <hash-id> -- path/to/file

To achieve the second one, pick a commit again, and run:

git show <hash-id>:path/to/file

There's a bit of weirdness here. With git checkout, the path to the file is relative to where you are in the work-tree, just as it is with git add. If you're in the sub-folder feed and the file is named feed/app.py, you just use app.py here. But with git show, you must use the full name of the file, or write git show hash-id:./app.py. (Internally, this has to do with the fact that git add and git checkout take arguments that Git calls pathspecs, while git show doesn't. But in practice it's just Git being messy and a bit hard to use.)

There are a lot of ways to name commits

As shown above, you can name a commit by its hash ID. You can use git log, perhaps with --all --decorate --online --graph—one of my favorite Git commands—to find lots of hash IDs, which you can then cut and paste. You can use a branch name like master to mean the commit that the name points to. For instance, if your graph looks kind of like this (though git log --graph will draw it vertically instead of horizontally):

...--F--G--H   <-- master
         \
          K--L   <-- test

you can use the name test to mean commit L, just as the name master means commit H. And, when you have a remote like origin and run git fetch, your Git sets up all your remote-tracking names, like origin/master, to remember what your Git got from their Git when your Git called them up and said: Hey, what commits do you have, under what branch names?

So, if you had:

...--F--G--H   <-- master, origin/master

and then you added a new commit I to the end of your master, you now have:

...--F--G--H   <-- origin/master
            \
             I   <-- master

So now instead of finding the actual hash ID for commit H, you can use the name origin/master to identify it. Hence:

git show origin/master:feed/app.py

will let you see the version that is still in commit H, which is still identified by your origin/master.

Your origin/master will be automatically updated on each git fetch to origin: your Git calls up their Git, asks it what branches they have, and they say "I have master and it's commit big-ugly-hash-ID". Your Git gets that commit if you don't have it yet—along with all the earlier commits you need too—and now you have it. Then your Git sets your origin/master to remember that big ugly hash ID.

So, this gets us to the final, and maybe best, way to view or extract that file from their commit: run git fetch if needed, then use git show or git checkout with the name origin/master—assuming you call that other Git origin, as most people do—and the appropriate path name. For git checkout, the appropriate path name depends where you are in your work-tree. For git show, the appropriate path name is the full path or starts with ./. The syntax for git checkout is:

git checkout <commit> -- <path>

and the syntax for git show is:

git show <commit>:<path>

and you just have to remember that they're different like this.

Upvotes: 1

Julian

Reputation: 36720

I think your are looking for

git checkout sec_aggregator/feed app.py

This will checkout app.py from branch feed defined on remote sec_aggregator