maaartinus
maaartinus

Reputation: 46492

How to find out the space requirements of files to be committed?

I'm going to archive an old huge project containing a lot of garbage. I hope I'll never need it again, but I want to put all important things under version control. Because of the chaos in the project, it's not easy to say what are the sources and what can go away (there's no makefile, no make clean, nothing). So I'd like to put there nearly everything and consider only the largest files for exclusion.

How can I list the files to be committed (or to be staged) together with their size?

I could write a script or whatever, but hope for a simpler solution. I'm working under Cygwin and the only gui available is git gui which doesn't show the file sizes. Otherwise it'd be perfect for what I need.

Upvotes: 14

Views: 4035

Answers (4)

Stripy42
Stripy42

Reputation: 123

If you've already added the files, within git is the command ls-files. The output can be piped in various clever ways to get what you need. https://git-scm.com/docs/git-ls-files

I would suggest setting up a .gitignore file to wildcard out any obvious ones before the first big git add.

Using the -s switch to list the staged items, get just the file paths, and then uses du to get the file size:

git ls-files -s | awk -F' ' '{ print $4 }' | xargs du -ch 

Removing human from du leaves all the values in kb. Which allows sort to be used, putting the largest at the bottom:

git ls-files -s | awk -F' ' '{ print $4 }' | xargs du -c | sort -n

The output could then be used to remove large ones

git ls-files -s | awk -F' ' '{ print $4 }' | xargs du -c | sort -n

To then remove problem files use the git reset <file>. Taking the output, you can decide the rows to remove (this could be done cleverer, but just took the number of rows with tail, and then removed the total with head).

git ls-files -s | awk -F' ' '{ print $4 }' | xargs du -c | sort -n | tail -7 | head -6 | awk -F' ' '{ print $2 }' | xargs git reset

Upvotes: 3

samplebias
samplebias

Reputation: 37919

You could try this. It finds all files larger than 1M and sorts them from largest to smallest. The file sizes printed are in bytes:

cd ~/files_to_archive
find . -type f -size +1M -printf '%s %p\n' |sort -nr

Output:

74751072 ./linux-2.6.38-rc4.tar.bz2
34686037 ./git-source.tar.gz
14026384 ./Python-2.7.tar.gz

Updated: loop over the files returned by find and print their git status:

git ls-files -t `find . -type f -size +1M |xargs`

Upvotes: 3

Jonathan Leffler
Jonathan Leffler

Reputation: 755010

To a first approximation, du -sk . at the top of the directory tree will give you the space needed. After you do git gc, it might be an overestimate.

But you should have been using version control long before you reached the point of retiring the project.

Upvotes: 0

atx
atx

Reputation: 5079

I don't know about Git, but if you're using Mercurial, you could use a combination of:

ls -laS
hg status

Upvotes: -1

Related Questions