Follow Git history via commit parents in Script visits same commit twice

Question

I am trying to write a script that performs a check on every commit, and for that check I need to know the parents of the commit. After the check, I follow the same procedure with the parent commits.

My problem is that I encounter the same commit multiple times – so unless I have a cycle in my repository, I probably do something wrong.

import subprocess

def parents(rev):
  args = ['git', 'rev-list', '--parents', '-n', '1', rev]
  output = subprocess.check_output(args, stderr=subprocess.PIPE).decode()
  items = output.split()
  return items[1:]  # First SHA is the ID of the revision that we passed into the command

revisions = parents('HEAD')
visited = set()
while revisions:
  rev = revisions.pop()
  assert rev not in visited, rev
  visited.add(rev)
  print(rev)  # TODO: Do check on commit
  revisions += parents(rev)

I would expect this to print somethig similar to git rev-list HEAD, but the assertion fires after a while.

Why do I encounter the same commit twice with this method? Is my assumption incorrect that following the parents of a commit allows me to traverse the full history?

larsks · Accepted Answer

The behavior you are seeing is intrinsic to the git rev-list --parents command. Consider a repository that looks like this:

A--B--C
 \   /
   D

The output of git log --oneline might be:

0000004 (HEAD -> master) Merge branch "mybranch"
0000003 B
0000002 D
0000001 A

But commit A is the parent of both B and D. So for B:

$ git rev-list --parents -n1 B
0000003 0000001

And for D:

$ git rev-list --parents -n1 D
0000002 0000001

You see commit A listed twice, which is exactly what is triggering the issue in your question.

Depending on what you're trying to do, the easiest solution might be to iterate over the output of git rev-list HEAD, which will only list a commit once.

Follow Git history via commit parents in Script visits same commit twice

Answers (2)

`revision`: use a `prio_queue` to hold rewritten parents

Related Questions

Follow Git history via commit parents in Script visits same commit twice

Answers (2)

revision: use a prio_queue to hold rewritten parents

Related Questions

`revision`: use a `prio_queue` to hold rewritten parents