hendry
hendry

Reputation: 10823

Finding last change of file with git fast

I keep most of my data in git. And I need to know when the last commit (or change) of a file was. For example:

$ time git log -1 --format=%ai -- test
2015-09-09 10:51:50 +0800
real    0m0.003s
user    0m0.000s
sys     0m0.000s

However, I've discovered on non-trivial repositories this can get slower and slower... e.g. real 0m0.121s on a not very big repo. And if I am checking hundreds of files this way, it gets very slow indeed!

Obviously the alternative is to perhaps use modification time, which is fast:

$ time stat --printf="Change %z\nAccess %x\nModify %y\n" test
Change 2015-09-09 10:51:07.764630748 +0800
Access 2015-09-09 10:51:50.877882489 +0800
Modify 2015-09-09 10:51:07.764630748 +0800

real    0m0.001s
user    0m0.000s
sys     0m0.000s

But this only shows the last modification on the filesystem.

For example I have a file maintained in git, last changed in 2014. If I clone it out locally and use modification time to see the last change, I will see the last change as happened in the current year, 2015. This is misleading.

So, how can I make it faster to find the last change in the file by git's reckoning? Or have I missed an easy trick (no perl scripts please) like fixing the times on a clone/fetch ?

Upvotes: 0

Views: 412

Answers (3)

Roland Smith
Roland Smith

Reputation: 43495

Faced with the problem of generating this data for all the files in a repository, I wrote the gitdates.py script.

It uses git log, but parallelizes the work as much as possible by effectively starting as many git log commands as your CPU has cores.

It takes around 4 seconds to query the dates of all files on a repo with 258 files and 200 commits. That comes to 0.01 second/file.

Upvotes: 1

dpcasady
dpcasady

Reputation: 1846

You might see slight performance improvements by using one of the plumbing commands, like rev-list which, from the documentation, "lists commit objects in reverse chronological order". The log command actually gets it's results from rev-list behind the scenes.

That being said, nothing from git will give you the performance improvement that I think you're looking for. Remember that git doesn't track files, it tracks content. To find the last time a file was changed, you need to traverse the commit tree until you find content that is tied to the file in question. As you've pointed out, the farther back the file was edited, the longer it will take to traverse the tree.

You can shave a few milliseconds off with something like this (piping into sed to isolate the timestamp):

$ time git rev-list --pretty --format=%ai --max-count=1 --first-parent master test | sed -n 2p
2012-01-08 17:01:11 +0000

real    0m0.149s
user    0m0.134s
sys     0m0.016s

$ time git log -1 --format=%ai -- test
2012-01-08 17:01:11 +0000

real    0m0.166s
user    0m0.148s
sys     0m0.016s

There are plenty of options to rev-list, you may find others some that can speed it up further.

Upvotes: 1

David Neiss
David Neiss

Reputation: 8237

What about using a commit hook to update a file that maps each committed file name to the current time? Keep that file as a git note in the repo? Have to figure out how you want to deal with branches - use the branch name as a path prefix to the file name perhaps.

Upvotes: 0

Related Questions