Reputation: 689
Given a git repo, I need to generate a dictionary of each version controlled file's last modified date as a unix timestamp mapped to its file path. I need the last modified date as far as git is concerned - not the file system.
In order to do this, I'd like to get git to output a list of all files under version control along with each file's author date. The output from git ls-files
or git ls-tree -r master
would be perfect if their output had timestamps included on each line.
Is there a way to get this output from git?
Update for more context: I have a current implementation that consists of a python script that iterates through every file under source control and does a git log
on each one, but I'm finding that that doesn't scale well. The more files in the repo, the more git log
calls I have to make. So that has led me to look for a way to gather this info from git with fewer calls (ideally just 1).
Upvotes: 9
Views: 2764
Reputation: 39243
Here you go:
git ls-files -z | xargs -0 -n1 -I{} -- git log -1 --format='%at {}' {}
This works on bash
and probably sh
.
Upvotes: 0
Reputation: 60255
a list of all files under version control along with each file's author date
Scaling isn't a problem with this one:
#!/bin/sh
temp="${TMPDIR:-/tmp}/@@@commit-at@@@$$"
trap "rm '$temp'" 0 1 2 3 15
git log --pretty=format:"%H%x09%at" --topo-order --reverse "$@" >"$temp"
cut -f1 "$temp" \
| git diff-tree -r --root --name-status --stdin \
| awk '
BEGIN {FS="\t"; OFS="\t"}
FNR==1{++f}
f==1 {at[$1]=$2; next}
NF==1 {commit=$1; next}
$1=="D"{$1=""; delete last[$0]; next} # comment to also show deleted files
{did=$1;$1=""; last[$0]=at[commit]"\t"did}
END {for (f in last) print last[f]f}
' "$temp" - \
| sort -t"`printf '\t'`" -k3
Upvotes: 1
Reputation: 9488
What I would do is run git ls-files
and add all of them into an array, then run git log $date_args --name-only
, and then parse that output and remove those files from the array while adding the date information to a dictionary, and stop the processing once the array is empty.
Upvotes: 0
Reputation: 43495
I wrote the following script to output for each file the path, short hashtag and date.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-03-23 01:09:59 +0100 $
#
# To the extent possible under law, Roland Smith has waived all
# copyright and related or neighboring rights to gitdates.py. This
# work is published from the Netherlands. See
# http://creativecommons.org/publicdomain/zero/1.0/
"""For each file in a directory managed by git, get the short hash and
data of the most recent commit of that file."""
import os
import sys
import subprocess
import time
from multiprocessing import Pool
# Suppres terminal windows on MS windows.
startupinfo = None
if os.name == 'nt':
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
def filecheck(fname):
"""Start a git process to get file info. Return a string
containing the filename, the abbreviated commit hash and the
author date in ISO 8601 format.
Arguments:
fname -- Name of the file to check.
"""
args = ['git', '--no-pager', 'log', '-1', '--format=%h|%at', fname]
try:
b = subprocess.check_output(args, startupinfo=startupinfo)
data = b.decode()[:-1]
h, t = data.split('|')
out = (fname[2:], h, time.gmtime(float(t)))
except (subprocess.CalledProcessError, ValueError):
return (fname[2:], '', time.gmtime(0.0))
return out
def main():
"""Main program."""
# Get a list of all files
allfiles = []
# Get a list of excluded files.
exargs = ['git', 'ls-files', '-i', '-o', '--exclude-standard']
exc = subprocess.check_output(exargs).split()
if not '.git' in os.listdir('.'):
print('This directory is not managed by git.')
sys.exit(0)
for root, dirs, files in os.walk('.'):
if '.git' in dirs:
dirs.remove('.git')
tmp = [os.path.join(root, f) for f in files if f not in exc]
allfiles += tmp
# Gather the files' data using a Pool.
p = Pool()
filedata = []
for res in p.imap_unordered(filecheck, allfiles):
filedata.append(res)
p.close()
# Sort the data (latest modified first) and print it
filedata.sort(key=lambda a: a[2], reverse=True)
dfmt = '%Y-%m-%d %H:%M:%S %Z'
for name, tag, date in filedata:
print('{}|{}|{}'.format(name, tag, time.strftime(dfmt, date)))
if __name__ == '__main__':
main()
Upvotes: 0