gregsdennis
gregsdennis

Reputation: 8428

Discover branches with changes under a path

I plan on extracting a certain folder path in a GitHub repository to a new repository. For example:

- repository/
  - src/
    - primaryCode/
    - codeToExtract/
  - ci/
  - ...

I'm going to migrate codeToExtract to a new repository.

Is there a way to find branches that have changes to that folder? This is a team project, so manually checking them all is not an option.

Upvotes: 1

Views: 135

Answers (3)

jthill
jthill

Reputation: 60393

Instead of a separate search, just do it:

git clone -ns . ../extract
cd $_
git filter-branch \
        --subdirectory-filter src/codeToExtract \
        -- --all -- src/codeToExtract

The second set of params above (after the first --) is which branches you want to play with, the third set (after the second --) is what paths you care about.

Then (as always, Git doesn't care about repo boundaries or names itself, they're conveniences, only histories matter) push/fetch whatever resulting histories you want into any repo you want, under any names you want.

Upvotes: 0

torek
torek

Reputation: 489083

The question is ill-formed, but once the form is fixed, the answer is probably yes. But which yes, and which answer you want, depends on what you really mean.

Each commit holds a snapshot. No commit holds changes, in the same way that if you have several photos (perhaps of yourself at different ages) no single photo holds any changes. But maybe you had longer (or more) hair in one photo than in another, so that by comparing two photos, you can observe changes.

The problem, as you can probably now see, is that you have to pick two snapshots. Which two do you care about? You can pick any two, but only two—or, well, two at a time.

What Git cares about are the various commits. Each one, as we just said, holds a snapshot—but it also holds a bit more. It holds the name and email address of the person who made the snapshot, for instance. It holds a date-and-time-stamp. (Actually it has both an author and a committer, giving two names, email-addresses, and timestamps.) It has a log message, written by whoever made the commit to tell you why they made that commit. And, each commit stores the hash ID of its parent commit (or commits, in the case of a merge commit). This extra stuff is the metadata for the commit, with the main data being the source snapshot.

Each commit has its own unique hash ID. This hash ID, which appears to be random, is actually just a cryptographic checksum of the contents of that commit (data + metadata). That hash ID is how Git actually finds the commit—how it retrieves the commit's contents (data + metadata) from the main database that Git stores. You have seen these hash IDs in git log output, and abbreviated versions all over the place—Git desperately needs them, since they're the actual names of internal Git objects, so it's inevitable that Git will show you some of them. They look like b5101f929789889c2e536d915698f58d5c5c6b7a for instance. They are pretty useless to humans though: they're far too hard to remember; I have to cut and paste them to get them right.

Given any one particular commit hash ID, Git can fish out the commit and its metadata. That metadata includes the commit's parent commit hash ID, so Git can now fish out the parent as well. Then Git can compare the two commits, and that's what you see in, e.g., git log -p output: the result of this comparison. Both git log and git show reduce the complete snapshot to a set of changes, as compared to this commit's parent commit. That's where the two snapshots come from.

Now, because a commit has the hash ID of its parent, which has another hash ID of its parent, and so on, we can draw the commits as a long series of backwards-pointing nodes, with each node representing the commit and the arrow coming out of that node as the hash ID of the parent:

... <-o <-o <-o ...

But to get this process started, we have to know some starting (ending?) point hash ID. We could write down those big ugly hash IDs, or cut and paste them a lot, but we have a computer. Why not have the computer save the hash ID for us? This is where branch names come in.

What a branch name is, really, is a place to store one (1) hash ID. We store the hash ID of the last commit on the branch:

...--F--G--H   <-- master (HEAD)

(Here I've used uppercase letters like H in place of an actual hash, just so they're easier to talk about.) To make a new commit, we fiddle with source code in our work-tree, use git add tell Git to update its ready-to-snapshot copies of files, and then use git commit to collect the metadata and make a new snapshot. This gets a new, unpredictable hash ID. Remember, one of the inputs is the time, so even if we predict the source and our name and log message and so on, we won't know what the hash ID will be until we press Enter or click a "make commit" button or whatever.

In any case we get a new commit with a new hash ID that we can just call I:

...--F--G--H   <-- master (HEAD)
            \
             I

I's parent is H. Now comes the sneaky yet masterful trick: Git writes the actual hash ID of commit I into the current branch name, master. We can straighten out our drawing as we now have:

...--F--G--H--I   <-- master (HEAD)

We have a new snapshot, whose parent is the old snapshot.

If we create a new branch now, we get two names pointing to commit I:

...--F--G--H--I   <-- feature, master (HEAD)

Note that all the commits are on both branches. We can switch which branch has HEAD attached to it using git checkout feature:

...--F--G--H--I   <-- feature (HEAD), master

and now if we make a new commit J it will be only on feature:

...--F--G--H--I   <-- master
               \
                J   <-- feature

You now have most of the pieces you need to answer your own question

I'm going to migrate codeToExtract to a new repository.

Presumably you mean that you intend to take files whose names live in that directory / folder, out of some commit(s), and put them in a new repository. So far so good.

Is there a way to find branches that have changes to that folder?

As you now know, neither branches nor commits have changes, but branches do let you find commits, and if you pick any two particular snapshots (commits), you can compare them.

Remember that some commits may be on many branches. It's up to you what you want to do with this, if anything. It's also up to you to decide whether to compare each commit you examine to its parent(s), or to some fixed starting or ending point commit snapshot. You might, for instance, have a graph that includes, in part:

          o--o--*--K
         /          \
...--o--*--o--*--L---M--o   <-- br1
      \
       o--*--o--o   <-- br2

where each * commit has, when compared with its parent, some differences in files in the one folder in question.

You also need to decide what to do about merge commits. These are commits with more than one parent. I've given the one interesting merge commit above the letter M, and given each of its two parents the letters K and L (though in reality they'll all just have big ugly hash IDs). Merge commit M has a snapshot, just like any other commit. But it's hard to compare it to its parent, because it doesn't have one parent, it has two parents.

It's up to you to figure out what to do about this. If you decide to take (files from) both * commits that are parents of K and L respectively, you'll probably want to take (files from) commit M as well, even if those files match the ones in K and/or L.

It's possible that you don't care about any of this: maybe you only want to look at the tip commit of each branch, and compare each of those to all the other such tip commits to figure out which version(s) of the file(s) from the one folder you want. If that's what you want, you can use git diff to do these comparisons: give git diff two commit hash IDs and it will compare the snapshots in those two commits. Give it two branch names like master and feature, or br1 and br2, and it will compare the snapshots of the two commits identified by those names, without doing any parent-link-following.

Once you figure out what answer you want—or what question you want to answer—you can use this to get what you want.

Upvotes: 0

phd
phd

Reputation: 94676

git for-each-ref --format='%(refname)' refs/heads/ |
while read branch; do
    if test -n "`git rev-list -n1 $branch -- $path`"; then
        echo $branch
    fi
done

Explanation:

git for-each-ref --format='%(refname)' refs/heads/ — list all branches
while read branch — run the loop over every branch

git rev-list -n1 $branch -- $path — find a commit in the branch
    that touches the $path
if test -n … echo $branch — if at least one commit found print the branch name.

Upvotes: 3

Related Questions