Reputation: 2343
I'm currently thinking of changing my VCS (from subversion) to git. Is it possible to limit the file size within a commit in a git repository? For e. g. subversion there is a hook: http://www.davidgrant.ca/limit_size_of_subversion_commits_with_this_hook
From my experience people, especially those who are inexperienced, sometimes tend to commit files which should not go into a VCS (e. g. big file system images).
Upvotes: 31
Views: 15685
Reputation: 1323115
Another way is to version a .gitignore
, which will prevent any file with a certain extension to show up in the status.
You still can have hooks as well (on downstream or upstream, as suggested by the other answers), but at least all downstream repo can include that .gitignore
to avoid adding .exe
, .dll
, .iso
, ...
If you are using hooks, consider Git 2.42 (Q3 2023): some atoms that can be used in "--format=<format>
" for "git ls-tree
"(man) were not supported by git ls-files
(man), even though they were relevant in the context of the latter.
See commit 4d28c4f (23 May 2023) by ZheNing Hu (adlternative
).
(Merged by Junio C Hamano -- gitster
-- in commit 32fe7ff, 13 Jun 2023)
ls-files
: align format atoms with ls-treeSigned-off-by: ZheNing Hu
"
git ls-files --format
"(man) can be used to format the output of multiple file entries in the index, while "git ls-tree --format
"(man) can be used to format the contents of a tree object.
However, the current set of %(objecttype), "(objectsize)", and "%(objectsize:padded)" atoms supported by "git ls-files --format
" is a subset of what is available ingit ls-tree --format
(man)".Users sometimes need to establish a unified view between the index and tree, which can help with comparison or conversion between the two.
Therefore, this patch adds the missing atoms to "
git ls-files
"(man) --format".
- "
%(objecttype)
" can be used to retrieve the object type corresponding to a file in the index,- "
%(objectsize)
" can be used to retrieve the object size corresponding to a file in the index, and- "
%(objectsize:padded)
" is the same as "%(objectsize)
", except with padded format.
git ls-files
now includes in its man page:
objecttype
The object type of the file which is recorded in the index.
git ls-files
now includes in its man page:
objectsize[:padded]
The object size of the file which is recorded in the index ("
-
" if the object is acommit
ortree
). It also supports a padded format of size with "%(objectsize:padded)
".
Upvotes: 0
Reputation: 111
I want to highlight another set of approaches that address this issue at the pull request stage: GitHub Actions and Apps. It doesn't stop large files from being committed into a branch, but if they're removed prior to the merge then the resulting base branch will not have the large files in history.
There's a recently developed action that checks the added file sizes (through the GitHub API) against a user-defined reference value: lfs-warning.
I've also personally hacked together a Probot app to screen for large file sizes in a PR (against a user-defined value), but it's much less efficient: sizeCheck
Upvotes: 1
Reputation: 15090
You need a solution that caters to the following scenarios.
This hook (https://github.com/mgit-at/git-max-filesize) deals with the above 2 cases and seems to also correctly handle edge cases such as new branch pushes and branch deletes.
Upvotes: 0
Reputation: 6561
This one is pretty good:
#!/bin/bash -u
#
# git-max-filesize
#
# git pre-receive hook to reject large files that should be commited
# via git-lfs (large file support) instead.
#
# Author: Christoph Hack <[email protected]>
# Copyright (c) 2017 mgIT GmbH. All rights reserved.
# Distributed under the Apache License. See LICENSE for details.
#
set -o pipefail
readonly DEFAULT_MAXSIZE="5242880" # 5MB
readonly CONFIG_NAME="hooks.maxfilesize"
readonly NULLSHA="0000000000000000000000000000000000000000"
readonly EXIT_SUCCESS="0"
readonly EXIT_FAILURE="1"
# main entry point
function main() {
local status="$EXIT_SUCCESS"
# get maximum filesize (from repository-specific config)
local maxsize
maxsize="$(get_maxsize)"
if [[ "$?" != 0 ]]; then
echo "failed to get ${CONFIG_NAME} from config"
exit "$EXIT_FAILURE"
fi
# skip this hook entirely if maxsize is 0.
if [[ "$maxsize" == 0 ]]; then
cat > /dev/null
exit "$EXIT_SUCCESS"
fi
# read lines from stdin (format: "<oldref> <newref> <refname>\n")
local oldref
local newref
local refname
while read oldref newref refname; do
# skip branch deletions
if [[ "$newref" == "$NULLSHA" ]]; then
continue
fi
# find large objects
# check all objects from $oldref (possible $NULLSHA) to $newref, but
# skip all objects that have already been accepted (i.e. are referenced by
# another branch or tag).
local target
if [[ "$oldref" == "$NULLSHA" ]]; then
target="$newref"
else
target="${oldref}..${newref}"
fi
local large_files
large_files="$(git rev-list --objects "$target" --not --branches=\* --tags=\* | \
git cat-file $'--batch-check=%(objectname)\t%(objecttype)\t%(objectsize)\t%(rest)' | \
awk -F '\t' -v maxbytes="$maxsize" '$3 > maxbytes' | cut -f 4-)"
if [[ "$?" != 0 ]]; then
echo "failed to check for large files in ref ${refname}"
continue
fi
IFS=$'\n'
for file in $large_files; do
if [[ "$status" == 0 ]]; then
echo ""
echo "-------------------------------------------------------------------------"
echo "Your push was rejected because it contains files larger than $(numfmt --to=iec "$maxsize")."
echo "Please use https://git-lfs.github.com/ to store larger files."
echo "-------------------------------------------------------------------------"
echo ""
echo "Offending files:"
status="$EXIT_FAILURE"
fi
echo " - ${file} (ref: ${refname})"
done
unset IFS
done
exit "$status"
}
# get the maximum filesize configured for this repository or the default
# value if no specific option has been set. Suffixes like 5k, 5m, 5g, etc.
# can be used (see git config --int).
function get_maxsize() {
local value;
value="$(git config --int "$CONFIG_NAME")"
if [[ "$?" != 0 ]] || [[ -z "$value" ]]; then
echo "$DEFAULT_MAXSIZE"
return "$EXIT_SUCCESS"
fi
echo "$value"
return "$EXIT_SUCCESS"
}
main
You can configure the size in the serverside config
file by adding:
[hooks]
maxfilesize = 1048576 # 1 MiB
Upvotes: 8
Reputation: 53462
As I was struggling with it for a while, even with the description, and I think this is relevant for others too, I thought I'd post an implementation of how what J16 SDiZ described could be implemented.
So, my take on the server-side update
hook preventing too big files to be pushed:
#!/bin/bash
# Script to limit the size of a push to git repository.
# Git repo has issues with big pushes, and we shouldn't have a real need for those
#
# eis/02.02.2012
# --- Safety check, should not be run from command line
if [ -z "$GIT_DIR" ]; then
echo "Don't run this script from the command line." >&2
echo " (if you want, you could supply GIT_DIR then run" >&2
echo " $0 <ref> <oldrev> <newrev>)" >&2
exit 1
fi
# Test that tab replacement works, issue in some Solaris envs at least
testvariable=`echo -e "\t" | sed 's/\s//'`
if [ "$testvariable" != "" ]; then
echo "Environment check failed - please contact git hosting." >&2
exit 1
fi
# File size limit is meant to be configured through 'hooks.filesizelimit' setting
filesizelimit=$(git config hooks.filesizelimit)
# If we haven't configured a file size limit, use default value of about 100M
if [ -z "$filesizelimit" ]; then
filesizelimit=100000000
fi
# Reference to incoming checkin can be found at $3
refname=$3
# With this command, we can find information about the file coming in that has biggest size
# We also normalize the line for excess whitespace
biggest_checkin_normalized=$(git ls-tree --full-tree -r -l $refname | sort -k 4 -n -r | head -1 | sed 's/^ *//;s/ *$//;s/\s\{1,\}/ /g' )
# Based on that, we can find what we are interested about
filesize=`echo $biggest_checkin_normalized | cut -d ' ' -f4,4`
# Actual comparison
# To cancel a push, we exit with status code 1
# It is also a good idea to print out some info about the cause of rejection
if [ $filesize -gt $filesizelimit ]; then
# To be more user-friendly, we also look up the name of the offending file
filename=`echo $biggest_checkin_normalized | cut -d ' ' -f5,5`
echo "Error: Too large push attempted." >&2
echo >&2
echo "File size limit is $filesizelimit, and you tried to push file named $filename of size $filesize." >&2
echo "Contact configuration team if you really need to do this." >&2
exit 1
fi
exit 0
Note that it's been commented that this code only checks the latest commit, so this code would need to be tweaked to iterate commits between $2 and $3 and do the check to all of them.
Upvotes: 27
Reputation: 36423
Yes, git has hooks as well (git hooks). But it kind of depends on the actually work-flow you will be using.
If you have inexperienced users, it is much safer to pull, then to let them push. That way, you can make sure they won't screw up the main repository.
Upvotes: 2
Reputation: 119
The answers by eis and J-16 SDiZ suffer from a severe problem. They are only checking the state of the finale commit $3 or $newrev. They need to also check what is being submitted in the other commits between $2 (or $oldrev) and $3 (or $newrev) in the udpate hook.
J-16 SDiZ is closer to the right answer.
The big flaw is that someone whose departmental server has this update hook installed to protect it will find out the hard way that:
After using git rm to remove the big file accidentally being checked in, then the current tree or last commit only will be fine, and it will pull in the entire chain of commits, including the big file that was deleted, creating a swollen unhappy fat history that nobody wants.
To solution is either to check each and every commit from $oldrev to $newrev, or to specify the entire range $oldrev..$newrev. Be darn sure you are not just checking $newrev alone, or this will fail with massive junk in your git history, pushed out to share with others, and then difficult or impossible to remove after that.
Upvotes: 11
Reputation: 71
if you are using gitolite you can also try VREF. There is one VREF already provided by default (the code is in gitolite/src/VREF/MAX_NEWBIN_SIZE). It is called MAX_NEWBIN_SIZE. It works like this:
repo name
RW+ = username
- VREF/MAX_NEWBIN_SIZE/1000 = usernames
Where 1000 is example threshold in Bytes.
This VREF works like a update hook and it will reject your push if one file you are to push is greater than the threshold.
Upvotes: 4
Reputation: 195
I am using gitolite and the update hook was already being used - instead of using the update hook, I used the pre-receive hook. The script posted by Chriki worked fabulously with the exception that the data is passed via stdin - so I made one line change:
- refname=$3
+ read a b refname
(there may be a more elegant way to do that but it works)
Upvotes: 0
Reputation: 26910
You can use a hook, either pre-commit
hook (on client), or a update
hook (on server). Do a git ls-files --cached
(for pre-commit) or git ls-tree --full-tree -r -l $3
(for update) and act accordingly.
git ls-tree -l
would give something like this:
100644 blob 97293e358a9870ac4ddf1daf44b10e10e8273d57 3301 file1
100644 blob 02937b0e158ff8d3895c6e93ebf0cbc37d81cac1 507 file2
Grab the forth column, and it is the size. Use git ls-tree --full-tree -r -l HEAD | sort -k 4 -n -r | head -1
to get the largest file. cut
to extract, if [ a -lt b ]
to check size, etc..
Sorry, I think if you are a programmer, you should be able to do this yourself.
Upvotes: -3
Reputation: 301037
This is going to be a very rare case from what I have seen when some one checks in, say a 200Mb or even more size file.
While you can prevent this from happening by using server side hooks ( not sure about client side hooks since you have to rely on the person having the hooks installed ) much like how you would in SVN, you also have to take into account that in Git, it is much much easier to remove such a file / commit from the repository. You did not have such a luxury in SVN, atleast not an easy way.
Upvotes: 0