Reputation: 10823
I keep most of my data in git. And I need to know when the last commit (or change) of a file was. For example:
$ time git log -1 --format=%ai -- test
2015-09-09 10:51:50 +0800
real 0m0.003s
user 0m0.000s
sys 0m0.000s
However, I've discovered on non-trivial repositories this can get slower and
slower... e.g. real 0m0.121s
on a not very big repo. And if I am checking
hundreds of files this way, it gets very slow indeed!
Obviously the alternative is to perhaps use modification time, which is fast:
$ time stat --printf="Change %z\nAccess %x\nModify %y\n" test
Change 2015-09-09 10:51:07.764630748 +0800
Access 2015-09-09 10:51:50.877882489 +0800
Modify 2015-09-09 10:51:07.764630748 +0800
real 0m0.001s
user 0m0.000s
sys 0m0.000s
But this only shows the last modification on the filesystem.
For example I have a file maintained in git, last changed in 2014. If I clone it out locally and use modification time to see the last change, I will see the last change as happened in the current year, 2015. This is misleading.
So, how can I make it faster to find the last change in the file by git's reckoning? Or have I missed an easy trick (no perl scripts please) like fixing the times on a clone/fetch ?
Upvotes: 0
Views: 412
Reputation: 43495
Faced with the problem of generating this data for all the files in a repository, I wrote the gitdates.py script.
It uses git log
, but parallelizes the work as much as possible by effectively starting as many git log
commands as your CPU has cores.
It takes around 4 seconds to query the dates of all files on a repo with 258 files and 200 commits. That comes to 0.01 second/file.
Upvotes: 1
Reputation: 1846
You might see slight performance improvements by using one of the plumbing commands, like rev-list
which, from the documentation, "lists commit objects in reverse chronological order". The log
command actually gets it's results from rev-list
behind the scenes.
That being said, nothing from git will give you the performance improvement that I think you're looking for. Remember that git doesn't track files, it tracks content. To find the last time a file was changed, you need to traverse the commit tree until you find content that is tied to the file in question. As you've pointed out, the farther back the file was edited, the longer it will take to traverse the tree.
You can shave a few milliseconds off with something like this (piping into sed to isolate the timestamp):
$ time git rev-list --pretty --format=%ai --max-count=1 --first-parent master test | sed -n 2p
2012-01-08 17:01:11 +0000
real 0m0.149s
user 0m0.134s
sys 0m0.016s
$ time git log -1 --format=%ai -- test
2012-01-08 17:01:11 +0000
real 0m0.166s
user 0m0.148s
sys 0m0.016s
There are plenty of options to rev-list
, you may find others some that can speed it up further.
Upvotes: 1
Reputation: 8237
What about using a commit hook to update a file that maps each committed file name to the current time? Keep that file as a git note in the repo? Have to figure out how you want to deal with branches - use the branch name as a path prefix to the file name perhaps.
Upvotes: 0