Jir
Jir

Reputation: 3155

List the content of a directory for a specific git commit using GitPython

Using GitPython, I'm trying to list the content of a directory at a given commit (i.e. a "snapshot" of the directory at the time).

In the terminal, what I'd do is:

git ls-tree --name-only 4b645551aa82ec55d1794d0bae039dd28e6c5704

How can I do the same in GitPyhon?

Based on the answers I've found to a similar question (GitPython get tree and blob object by sha) I've tried recursively traversing base_commit.tree and its .trees, but I don't seem to get anywhere.

Any ideas?

Upvotes: 7

Views: 4408

Answers (3)

alfinkel24
alfinkel24

Reputation: 551

If you know the path to the directory, let's say it is foo/bar/baz and you have a GitPython Commit object, let's call it commit then you can access the blobs in the directory like so commit.tree['foo']['bar']['baz'].blobs and then get the individual blob (file) names to come up with your list of files in that directory at the commit point in time.

import git

repo = git.Repo('path/to/my/repo')
commit = next(repo.iter_commits(max_count=1))
files_in_dir = [b.name for b in commit.tree['foo']['bar']['baz'].blobs]

Upvotes: 1

alfalfasprout
alfalfasprout

Reputation: 271

Indeed, traversing the trees/subtrees is the right approach. However, the built in traverse method can have issues with Submodules. Instead, we can do the traversal ourselves iteratively and find all the blob objects (which contain the files in our repo at a given commit). There's no need to use execute.

def list_files_in_commit(commit):
    """
    Lists all the files in a repo at a given commit

    :param commit: A gitpython Commit object
    """
    file_list = []
    dir_list = []
    stack = [commit.tree]
    while len(stack) > 0:
        tree = stack.pop()
        # enumerate blobs (files) at this level
        for b in tree.blobs:
            file_list.append(b.path)
        for subtree in tree.trees:
            stack.append(subtree)
    # you can return dir_list if you want directories too
    return file_list

If you want the files affected by a given commit, this is available via commit.stats.files.

Upvotes: 5

Jir
Jir

Reputation: 3155

I couldn't find a more elegant way than actually calling execute. This is the end result:

configFiles = repo.git.execute(
    ['git', 'ls-tree', '--name-only', commit.hexsha, path]).split()

where commit is a git.Commit object and path is the path I'm interested in.

Upvotes: 1

Related Questions