user1816561
user1816561

Reputation: 339

gitpython and git diff

I am looking to get only the diff of a file changed from a git repo. Right now, I am using gitpython to actually get the commit objects and the files of git changes, but I want to do a dependency analysis on only the parts of the file changed. Is there any way to get the git diff from git python? Or am I going to have to compare each of the files by reading line by line?

Upvotes: 30

Views: 58786

Answers (8)

Nikola Đuza
Nikola Đuza

Reputation: 535

If you want to do git diff on a file between two commits this is the way to do it:

import git
   
repo = git.Repo()
path_to_a_file = "diff_this_file_across_commits.txt"
   
commits_touching_path = list(repo.iter_commits(paths=path_to_a_file))
   
print repo.git.diff(commits_touching_path[0], commits_touching_path[1], path_to_a_file)

This will show you the differences between two latest commits that were done to the file you specify.

Upvotes: 1

土豆先生
土豆先生

Reputation: 11

repo.git.diff("main", "head~5")

result:

@@ -97,6 +97,25 @@ + </configuration> + + </plugin> + <plugin> - <groupId>org.codehaus.mojo</groupId> - <artifactId>findbugs-maven-plugin</artifactId> - <version>3.0.5</version> - <configuration> - <effort>Low</effort> - <threshold>Medium</threshold> 

Upvotes: 1

Greg Hewgill
Greg Hewgill

Reputation: 993075

Git does not store the diffs, as you have noticed. Given two blobs (before and after a change), you can use Python's difflib module to compare the data.

Upvotes: 6

K. Symbol
K. Symbol

Reputation: 3692

PyDriller +1

pip install pydriller

But with the new API:

Breaking API: ```
from pydriller import Repository

for commit in Repository('https://github.com/ishepard/pydriller').traverse_commits():
    print(commit.hash)
    print(commit.msg)
    print(commit.author.name)

    for file in commit.modified_files:
        print(file.filename, ' has changed')

Upvotes: -1

Cairo
Cairo

Reputation: 616

You can use GitPython with the git command "diff", just need to use the "tree" object of each commit or the branch for that you want to see the diffs, for example:

repo = Repo('/git/repository')
t = repo.head.commit.tree
repo.git.diff(t)

This will print "all" the diffs for all files included in this commit, so if you want each one you must iterate over them.

With the actual branch it's:

repo.git.diff('HEAD~1')

Hope this help, regards.

Upvotes: 20

Ciasto piekarz
Ciasto piekarz

Reputation: 8277

Here is how you do it

import git
repo = git.Repo("path/of/repo/")

# the below gives us all commits
repo.commits()

# take the first and last commit

a_commit = repo.commits()[0]
b_commit = repo.commits()[1]

# now get the diff
repo.diff(a_commit,b_commit)

Upvotes: -2

ZaxR
ZaxR

Reputation: 5155

If you're looking to recreate something close to what a standard git diff would show, try:

# cloned_repo = git.Repo.clone_from(
#     url=ssh_url,
#     to_path=repo_dir,
#     env={"GIT_SSH_COMMAND": "ssh -i " + SSH_KEY},
# ) 
for diff_item in cloned_repo.index.diff(None, create_patch=True):
    repo_diff += (
        f"--- a/{diff_item.a_blob.name}\n+++ b/{diff_item.b_blob.name}\n"
        f"{diff_item.diff.decode('utf-8')}\n\n"
        )

Upvotes: 6

D. A.
D. A.

Reputation: 3509

If you want to access the contents of the diff, try this:

repo = git.Repo(repo_root.as_posix())
commit_dev = repo.commit("dev")
commit_origin_dev = repo.commit("origin/dev")
diff_index = commit_origin_dev.diff(commit_dev)

for diff_item in diff_index.iter_change_type('M'):
    print("A blob:\n{}".format(diff_item.a_blob.data_stream.read().decode('utf-8')))
    print("B blob:\n{}".format(diff_item.b_blob.data_stream.read().decode('utf-8'))) 

This will print the contents of each file.

Upvotes: 26

Related Questions