Reputation: 1201

How can I find out git performance?

As a Devops admin,what are the ways to check git performance in my environment?

After every major change like Git upgrade, I want to run a test which finds out how my git is doing.
How can I achieve this?

Upvotes: 1

Answers (4)

David Ongaro

Reputation: 3938

While git-hyperfine looks interesting, my impression is that it's mainly a tool for git developers. As a user, I think it's easier to just stick with vanilla hyperfine. E.g.

$ hyperfine --warmup 3 -L rev /usr/bin/git,/opt/homebrew/Cellar/git/2.42.0/bin/git '{rev} status'
Benchmark 1: /usr/bin/git status
  Time (mean ± σ):      87.3 ms ±   1.9 ms    [User: 32.1 ms, System: 327.4 ms]
  Range (min … max):    84.9 ms …  93.7 ms    32 runs

Benchmark 2: /opt/homebrew/Cellar/git/2.42.0/bin/git status
  Time (mean ± σ):      76.0 ms ±   3.6 ms    [User: 28.9 ms, System: 301.0 ms]
  Range (min … max):    72.7 ms …  94.7 ms    38 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  /opt/homebrew/Cellar/git/2.42.0/bin/git status ran
    1.15 ± 0.06 times faster than /usr/bin/git status

The main point is to use an actual benchmarking tool, not time which doesn't provide standard deviations and therefore doesn't provide comparable results unless the difference is huge.

As for the test cases to use: I'm afraid you're on your own on that, since it depends heavily on your use cases and repositories.

Upvotes: 1

VonC

Reputation: 1324278

The 2022 answer is to use avar/git-hyperfine/, a wrapper around sharkdp/hyperfine, a command-line benchmarking tool.

Illustration:

Git 2.38 (Q3 2022) allows large objects read from a packstream to be streamed into a loose object file straight, without having to keep it in-core as a whole.

The performance improvement is measure with git-hyperfine.

See commit aaf8122, commit 2b6070a, commit 97a9db6, commit a1bf5ca (11 Jun 2022) by Han Xin (chiyutianyi).
See commit 3c3ca0b, commit 21e7d88 (11 Jun 2022) by Ævar Arnfjörð Bjarmason (avar).
^{(Merged by Junio C Hamano -- gitster -- in commit 73b9ef6, 14 Jul 2022)}

unpack-objects: use stream_loose_object() to unpack large objects

^{Helped-by: Ævar Arnfjörð Bjarmason}
^{Helped-by: Derrick Stolee}
^{Helped-by: Jiang Xin}
^{Signed-off-by: Han Xin}
^{Signed-off-by: Ævar Arnfjörð Bjarmason}

Make use of the stream_loose_object() function introduced in the preceding commit to unpack large objects.
Before this we'd need to malloc() the size of the blob before unpacking it, which could cause OOM with very large blobs.

We could use the new streaming interface to unpack all blobs, but doing so would be much slower, as demonstrated e.g. with this benchmark using git-hyperfine:
rm -rf /tmp/scalar.git &&
git clone --bare https://github.com/Microsoft/scalar.git /tmp/scalar.git &&
mv /tmp/scalar.git/objects/pack/*.pack /tmp/scalar.git/my.pack &&
git hyperfine \
  -r 2 --warmup 1 \
  -L rev origin/master,HEAD -L v "10,512,1k,1m" \
  -s 'make' \
  -p 'git init --bare dest.git' \
  -c 'rm -rf dest.git' \
  './git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/scalar.git/my.pack'
Here we'll perform worse with lower core.bigFileThreshold settings with this change in terms of speed, but we're getting lower memory use in return:
Summary
  './git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master' ran
    1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
    1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
    1.01 ± 0.02 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
    1.02 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
    1.09 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
    1.10 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
    1.11 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
A better benchmark to demonstrate the benefits of that this one, which creates an artificial repo with a 1, 25, 50, 75 and 100MB blob:
rm -rf /tmp/repo &&
git init /tmp/repo &&
(
  cd /tmp/repo &&
  for i in 1 25 50 75 100
  do
      dd if=/dev/urandom of=blob.$i count=$(($i*1024)) bs=1024
  done &&
  git add blob.* &&
  git commit -mblobs &&
  git gc &&
  PACK=$(echo .git/objects/pack/pack-*.pack) &&
  cp "$PACK" my.pack
) &&
git hyperfine \
  --show-output \
  -L rev origin/master,HEAD -L v "512,50m,100m" \
  -s 'make' \
  -p 'git init --bare dest.git' \
  -c 'rm -rf dest.git' \
  '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum'
Using this test we'll always use >100MB of memory on origin/master (around ~105MB), but max out at e.g. ~55MB if we set core.bigFileThreshold=50m.

The relevant "Maximum resident set size" lines were manually added below the relevant benchmark:
'/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master' ran
      Maximum resident set size (kbytes): 107080
  1.02 ± 0.78 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
      Maximum resident set size (kbytes): 106968
  1.09 ± 0.79 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
      Maximum resident set size (kbytes): 107032
  1.42 ± 1.07 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
      Maximum resident set size (kbytes): 107072
  1.83 ± 1.02 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
      Maximum resident set size (kbytes): 55704
  2.16 ± 1.19 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
      Maximum resident set size (kbytes): 4564
This shows that if you have enough memory this new streaming method is slower the lower you set the streaming threshold, but the benefit is more bounded memory use.

An earlier version of this patch introduced a new "core.bigFileStreamingThreshold" instead of re-using the existing "core.bigFileThreshold" variable.
As noted in a detailed overview of its users in this thread, using it has several different meanings.

Still, we consider it good enough to simply re-use it.
While it's possible that someone might want to e.g. consider objects "small" for the purposes of diffing but "big" for the purposes of writing them such use-cases are probably too obscure to worry about.
We can always split up "core.bigFileThreshold" in the future if there's a need for that.

Upvotes: 1

VonC

Reputation: 1324278

Another way to test Git performance is by relying on the not-so-old perf folder: the "Performance testing framework" introduced in 2012 with Git 1.7.10, with commit 342e9ef

the 'run' script lets you specify arbitrary build dirs and revisions.

It lets you specify which tests to run; or you can also do it manually

Two different sizes of test repos can be configured, and the scripts just copy one or more of those.

So... make perf

Git 2.14 (Q3 2017) is still adding to that framework, with a test showing that runtimes of the wildmatch() function used for globbing in git grow exponentially in the face of some pathological globs.

See commit 62ca75a, commit 91de27c (11 May 2017) by Ævar Arnfjörð Bjarmason (avar).
^{(Merged by Junio C Hamano -- gitster -- in commit 140921c, 30 May 2017)}

Upvotes: 0

andy_l

Reputation: 61

Yes, as @DevidN pointed out its depend on various parameter like configuration, Network. I also had same Q when we were migrating from SVN to git and stats after migration.

I have used 'time' with combination of different git commands and written a script to monitor all those commands from server.

Eg:

$ time git clone http://####@#########.git
Cloning into '#####'...
remote: Counting objects: 849, done.
remote: Compressing objects: 100% (585/585), done.
remote: Total 849 (delta 435), reused 0 (delta 0)
Receiving objects: 100% (849/849), 120.85 KiB | 0 bytes/s, done.
Resolving deltas: 100% (435/435), done.
Checking connectivity... done.

real    0m4.895s
user    0m0.140s
sys     0m0.046s

Upvotes: 1

How can I find out git performance?

Answers (4)

`unpack-objects`: use `stream_loose_object()` to unpack large objects

Related Questions

How can I find out git performance?

Answers (4)

unpack-objects: use stream_loose_object() to unpack large objects

Related Questions

`unpack-objects`: use `stream_loose_object()` to unpack large objects