Quinten
Quinten

Reputation: 41533

Why are some signed integer faster than unsigned integer types in Julia?

I have been studying the Abstract Types and mainly the difference between signed integer types and unsigned integer types. I was really curious about the performance of Julia so I measured the differences between those types. Since Julia is mostly implemented in C, I would assume almost the same behavior in abstract types. This question: performance of unsigned vs signed integers give some explanation about that unsigned integers leads to the same or better performance than signed. But in my benchmarking, I found that some Integers are faster than unsigned Integers. Here is some reproducible code:

julia> using BenchmarkTools

julia> Int_8 = rand(Int8, 1000)
julia> Int_16 = rand(Int16, 1000)
julia> Int_32 = rand(Int32, 1000)
julia> Int_64 = rand(Int64, 1000)
julia> Int_128 = rand(Int128, 1000)

julia> UInt_8 = rand(UInt8, 1000)
julia> UInt_16 = rand(UInt16, 1000)
julia> UInt_32 = rand(UInt32, 1000)
julia> UInt_64 = rand(UInt64, 1000)
julia> UInt_128 = rand(UInt128, 1000)

julia> @benchmark Int_8.^2
BenchmarkTools.Trial: 10000 samples with 147 evaluations.
 Range (min … max):  698.333 ns … 54.836 μs  ┊ GC (min … max):  0.00% … 98.34%
 Time  (median):     733.088 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):   870.727 ns ±  2.209 μs  ┊ GC (mean ± σ):  11.56% ±  4.49%

  ▇█▅▄▂▁ ▇▇▄▂▁       ▂▁ ▁   ▁▁                                 ▂
  ██████████████▇▆▆▆██████▇██████▇▆▅▆▆▄▄▅▆▆▆▄▄▅▆▆▆▆▆▅▅▄▅▄▄▄▅▃▄ █
  698 ns        Histogram: log(frequency) by time      1.28 μs <

 Memory estimate: 1.14 KiB, allocs estimate: 5.

julia> @benchmark Int_16.^2
BenchmarkTools.Trial: 10000 samples with 16 evaluations.
 Range (min … max):  986.625 ns …  3.895 ms  ┊ GC (min … max):  0.00% … 99.93%
 Time  (median):       1.077 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):     3.069 μs ± 77.347 μs  ┊ GC (mean ± σ):  50.40% ±  2.00%

  ██▃              ▂▅▆▆▅▄▄▃▂▂▁                                 ▂
  ████▆▇▆▅▅▅▆▅▅▃▆▆▄████████████▇▅▅▅▅▄▄▆▇▆▆▇▇▇█▇▇▆▆▅▆▄▅▅▄▅▅▆▄▅▅ █
  987 ns        Histogram: log(frequency) by time      3.68 μs <

 Memory estimate: 2.14 KiB, allocs estimate: 5.

julia> @benchmark Int_32.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.327 μs …  4.701 ms  ┊ GC (min … max):  0.00% … 99.89%
 Time  (median):     1.474 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.481 μs ± 80.410 μs  ┊ GC (mean ± σ):  39.99% ±  1.73%

  ▆█▆▃               ▁▄▄▄▄▃▃▂▁▁                              ▁
  █████▇▆▆▆▆▅▄▅▅▅▄▄▃▅███████████▇▇▅▅▅▆▆▆▄▅▅▆▅▆▄▆▅▅▇▆▅▅▅▆▅▅▆▆ █
  1.33 μs      Histogram: log(frequency) by time     5.82 μs <

 Memory estimate: 4.14 KiB, allocs estimate: 5.

julia> @benchmark Int_64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.749 μs …  2.949 ms  ┊ GC (min … max):  0.00% … 99.73%
 Time  (median):     1.887 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   5.069 μs ± 84.920 μs  ┊ GC (mean ± σ):  52.89% ±  3.15%

  ▇█▆▃▁                       ▂▃▃▃▂▂▂▁▁                      ▂
  █████▇▇▇█▇▆▅▅▅▅▅▄▅▄▄▅▁▃▅▃▅▅███████████▇▇▆▅▃▅▄▁▄▅▄▅▆▅▇▆▅▅▆▆ █
  1.75 μs      Histogram: log(frequency) by time     7.31 μs <

 Memory estimate: 8.02 KiB, allocs estimate: 5.

julia> @benchmark Int_128.^2
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):   8.516 μs …  5.690 ms  ┊ GC (min … max):  0.00% … 99.72%
 Time  (median):      8.778 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   10.962 μs ± 96.809 μs  ┊ GC (mean ± σ):  15.28% ±  1.73%

  ▄█▅▃▁                                                       ▁
  ██████▇▇▆▆▆▆▇▆▆▆▆▅▄▃▃▄▃▂▃▄▄▄▄▃▄▄▅▆▆▆▆▆▆▆▄▅▄▄▃▃▄▆▆▆▆▆▅▄▄▅▆▆▆ █
  8.52 μs      Histogram: log(frequency) by time      19.2 μs <

 Memory estimate: 15.83 KiB, allocs estimate: 5.

julia> @benchmark UInt_8.^2
BenchmarkTools.Trial: 10000 samples with 104 evaluations.
 Range (min … max):  845.058 ns … 76.695 μs  ┊ GC (min … max): 0.00% … 98.21%
 Time  (median):     897.909 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.027 μs ±  2.643 μs  ┊ GC (mean ± σ):  9.58% ±  3.68%

  ██▇▆▃▆█▇▅▄▃▂▁       ▁▁▁▁▁▂▁▁ ▁                               ▃
  ███████████████▇▇▆▇█████████████▇▆▆▆▆▅▅▆▆▆▆▆▆▅▆▆▆▅▅▆▅▅▄▅▄▅▁▄ █
  845 ns        Histogram: log(frequency) by time      1.59 μs <

 Memory estimate: 1.14 KiB, allocs estimate: 5.

julia> @benchmark UInt_16.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.032 μs …  6.110 ms  ┊ GC (min … max):  0.00% … 99.96%
 Time  (median):     1.139 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   2.458 μs ± 83.275 μs  ┊ GC (mean ± σ):  47.86% ±  1.41%

   ██▅▂▁            ▁▂▃▂▂▁▁                                  ▂
  ███████▆▆▆▆▆▄▇▆▆▇███████████▆▄▅▆▆▄▄▄▃▃▃▄▃▄▃▅▁▃▄▅▃▃▄▄▅▄▅▆▆▆ █
  1.03 μs      Histogram: log(frequency) by time     3.83 μs <

 Memory estimate: 2.14 KiB, allocs estimate: 5.

julia> @benchmark UInt_32.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.205 μs …   4.694 ms  ┊ GC (min … max):  0.00% … 99.91%
 Time  (median):     1.332 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.821 μs ± 101.562 μs  ┊ GC (mean ± σ):  59.39% ±  2.23%

   ▇█▆▃                      ▁▂▂▂▂▂▁                          ▂
  ▇██████▆▇▇▆▆▆▇▇▅▅▄▄▁▃▃▄▄▃▄▇█████████▇▇▇▆▇▅▅▅▄▁▄▄▄▆▇▆▆▆▄▅▆▆▆ █
  1.2 μs       Histogram: log(frequency) by time      4.42 μs <

 Memory estimate: 4.14 KiB, allocs estimate: 5.

julia> @benchmark UInt_64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.695 μs …  2.845 ms  ┊ GC (min … max):  0.00% … 99.81%
 Time  (median):     1.842 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   4.734 μs ± 84.073 μs  ┊ GC (mean ± σ):  56.11% ±  3.16%

  ▅██▆▃▁                                 ▁▁▁▁                ▂
  ███████▇▆▆▇▇▆▆▄▄▅▄▃▄▅▆▆▅▅▁▃▃▃▄▄▄▁▅▃▅▅▇█████████▇▇▇▆▆▄▆▆▄▄▃ █
  1.7 μs       Histogram: log(frequency) by time     5.71 μs <

 Memory estimate: 8.02 KiB, allocs estimate: 5.

julia> @benchmark UInt_128.^2
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min … max):   8.474 μs …  4.597 ms  ┊ GC (min … max):  0.00% … 99.69%
 Time  (median):      8.691 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   10.980 μs ± 87.765 μs  ┊ GC (mean ± σ):  15.96% ±  1.99%

  ▇█▆▄▃▁▁▁                                                    ▁
  █████████████▇█▇▇▇▆▆▆▅▅▄▆▄▄▄▄▅▄▄▂▄▅▄▄▄▅▆▇▇▇▆▇▇█▇▇▇▆▆▆▆▆▇▅▅▄ █
  8.47 μs      Histogram: log(frequency) by time        17 μs <

 Memory estimate: 15.83 KiB, allocs estimate: 5.

julia> @btime $Int_8.^2
  91.008 ns (1 allocation: 1.06 KiB)

julia> @btime $Int_16.^2
  319.261 ns (1 allocation: 2.06 KiB)

julia> @btime $Int_32.^2
  535.193 ns (1 allocation: 4.06 KiB)

julia> @btime $Int_64.^2
  923.778 ns (1 allocation: 7.94 KiB)

julia> @btime $Int_128.^2
  7.706 μs (1 allocation: 15.75 KiB)

julia> @btime $UInt_8.^2
  89.870 ns (1 allocation: 1.06 KiB)

julia> @btime $UInt_16.^2
  316.495 ns (1 allocation: 2.06 KiB)

julia> @btime $UInt_32.^2
  539.642 ns (1 allocation: 4.06 KiB)

julia> @btime $UInt_64.^2
  872.708 ns (1 allocation: 7.94 KiB)

julia> @btime $UInt_128.^2
  7.061 μs (1 allocation: 15.75 KiB)

As you can see UInt8, UInt16 is slightly faster than Int8, Int16. But Int32 becomes faster than UInt32. Is there a reason for that? Int64 and Int128 become slower again. As you can see the performance changes at some point between unsigned integers and signed integers and crosses.


So I was wondering if anyone could explain why some signed Integers are faster than unsigned Integers and vice versa? Should there not be a more linear difference, because now it crosses performances within the types?


Edit: add benchmark like @Shayan mentioned in the comments

Here I added the benchmark. It seems that UInt8 and UInt32 of unsigned integers are faster and Int64 and Int128 Integers are faster, so still some difference between these Abstract Types.

julia> @benchmark $Int_8.^2
BenchmarkTools.Trial: 10000 samples with 958 evaluations.
 Range (min … max):   81.209 ns …   9.620 μs  ┊ GC (min … max):  0.00% … 97.35%
 Time  (median):     115.392 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   221.416 ns ± 855.670 ns  ┊ GC (mean ± σ):  43.50% ± 10.98%

  █▃                                                            ▁
  ██▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▄▃▄▅▆ █
  81.2 ns       Histogram: log(frequency) by time       7.39 μs <

 Memory estimate: 1.06 KiB, allocs estimate: 1.

julia> @benchmark $Int_16.^2
BenchmarkTools.Trial: 10000 samples with 241 evaluations.
 Range (min … max):  312.037 ns … 259.117 μs  ┊ GC (min … max):  0.00% … 99.79%
 Time  (median):     363.174 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.899 μs ±  18.716 μs  ┊ GC (mean ± σ):  77.72% ±  7.83%

  ▄▆██▃    ▂▂▁                                    ▁             ▂
  █████████████▇▆▆▆█▇▇▆▆▅▆▆▅▄▅▆▅▅▄▄▅▄▃▄▄▄▄▆▅▆▆▆▇▇██████▇▇▇▇▆▅▅▄ █
  312 ns        Histogram: log(frequency) by time       1.35 μs <

 Memory estimate: 2.06 KiB, allocs estimate: 1.

julia> @benchmark $Int_32.^2
BenchmarkTools.Trial: 9034 samples with 194 evaluations.
 Range (min … max):  520.572 ns … 251.764 μs  ┊ GC (min … max):  0.00% … 99.71%
 Time  (median):     583.026 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):     2.824 μs ±  22.005 μs  ┊ GC (mean ± σ):  77.60% ±  9.84%

  ▇█▄▁▂▃▁ ▁▁                                                    ▁
  ████████████▇▆▇▆▅▃▅▄▅▄▃▄▄▄▄▄▁▄▅▆▆▆▆▇▇▇▆▆▅▄▅▅▅▃▄▄▁▁▃▁▄▁▁▁▁▁▁▁▃ █
  521 ns        Histogram: log(frequency) by time       3.06 μs <

 Memory estimate: 4.06 KiB, allocs estimate: 1.

julia> @benchmark $Int_64.^2
BenchmarkTools.Trial: 10000 samples with 44 evaluations.
 Range (min … max):  867.795 ns … 653.472 μs  ┊ GC (min … max):  0.00% … 99.82%
 Time  (median):       1.009 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):     3.341 μs ±  36.073 μs  ┊ GC (mean ± σ):  68.34% ±  6.30%

  ▃▆▇██▆▆▄▂▁              ▁                                     ▂
  ██████████▇▆▆▇▇▆▇▇▆▅▅▇▇█████▇▇▅▆▄▅▄▃▄▄▃▃▄▁▅▆▇▅▄▅▄▄▅▅▁▄▅▃▃▃▄▄▃ █
  868 ns        Histogram: log(frequency) by time       2.94 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

julia> @benchmark $Int_128.^2
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):   6.683 μs …  3.352 ms  ┊ GC (min … max):  0.00% … 99.74%
 Time  (median):      7.721 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   10.562 μs ± 96.224 μs  ┊ GC (mean ± σ):  27.31% ±  2.99%

  ▇▂▅▅   █▅                                                   ▁
  ████▇▄▆███▇▅▆▆▅▅▆▃▄▂▄▅▄▄▄▄▄▄▄▅▅▃▄▃▄▄▂▄▂▃▂▄▃▃▆▆▆▆▄▄▆▆▇▇▆▆▆▅▇ █
  6.68 μs      Histogram: log(frequency) by time      14.9 μs <

 Memory estimate: 15.75 KiB, allocs estimate: 1.

julia> @benchmark $UInt_8.^2
BenchmarkTools.Trial: 10000 samples with 967 evaluations.
 Range (min … max):   80.428 ns …  10.368 μs  ┊ GC (min … max):  0.00% … 98.18%
 Time  (median):     108.816 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   217.285 ns ± 835.138 ns  ┊ GC (mean ± σ):  43.71% ± 11.08%

  █▃                                                            ▁
  ██▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▄▅▅▅▅ █
  80.4 ns       Histogram: log(frequency) by time       7.23 μs <

 Memory estimate: 1.06 KiB, allocs estimate: 1.

julia> @benchmark $UInt_16.^2
BenchmarkTools.Trial: 8270 samples with 320 evaluations.
 Range (min … max):  315.113 ns … 202.421 μs  ┊ GC (min … max):  0.00% … 99.80%
 Time  (median):     361.734 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.899 μs ±  16.395 μs  ┊ GC (mean ± σ):  79.09% ±  9.07%

  ▅▆█▆▂▃▁▂▂▁                                                    ▁
  ██████████▇▆▆███▇▆▆▆▆▅▅▅▅▅▄▁▅▄▄▄▅▃▄▃▃▄▄▅▆▅▆▆▇▇▇▇▇▇▇█▆▇▆▆▄▆▅▄▆ █
  315 ns        Histogram: log(frequency) by time       1.35 μs <

 Memory estimate: 2.06 KiB, allocs estimate: 1.

julia> @benchmark $UInt_32.^2
BenchmarkTools.Trial: 9148 samples with 194 evaluations.
 Range (min … max):  525.278 ns … 236.308 μs  ┊ GC (min … max):  0.00% … 99.74%
 Time  (median):     582.335 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):     2.782 μs ±  21.821 μs  ┊ GC (mean ± σ):  78.12% ±  9.84%

  ▇█▄▂▂▃▁  ▁                                                    ▁
  ███████▇████▇▆▆▅▆▆▄▄▄▄▄▄▃▃▃▄▃▃▄▅▅▆▅▅▅▆▆▅▆▅▁▃▄▁▁▁▁▁▁▄▁▁▃▁▁▁▁▁▃ █
  525 ns        Histogram: log(frequency) by time       2.92 μs <

 Memory estimate: 4.06 KiB, allocs estimate: 1.

julia> @benchmark $UInt_64.^2
BenchmarkTools.Trial: 10000 samples with 45 evaluations.
 Range (min … max):  859.467 ns … 592.092 μs  ┊ GC (min … max):  0.00% … 99.80%
 Time  (median):     988.933 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):     3.448 μs ±  36.452 μs  ┊ GC (mean ± σ):  70.21% ±  6.60%

    ▁▃▄█▃                                                        
  ▃▅█████▇▅█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▂▁▂ ▃
  859 ns           Histogram: frequency by time          2.3 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

julia> @benchmark $UInt_128.^2
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):   7.691 μs …   4.544 ms  ┊ GC (min … max):  0.00% … 99.77%
 Time  (median):      7.794 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   11.989 μs ± 115.585 μs  ┊ GC (mean ± σ):  27.18% ±  2.82%

  █▄▃▃▂▂▁                  ▁                                   ▁
  █████████████▇▇▇▇▇▆▇▆▇▆▇██▇▇██▇▆▇▅▆▅▅▅▆▅▅▅▅▅▄▅▅▃▄▅▅▅▅▅▅▅▄▅▄▅ █
  7.69 μs       Histogram: log(frequency) by time      20.6 μs <

 Memory estimate: 15.75 KiB, allocs estimate: 1.

Upvotes: 1

Views: 250

Answers (1)

Luis Colorado
Luis Colorado

Reputation: 12708

That's not a type related question, the implementation of signed vs. unsigned numbers is not a language related one, but a compiler implementation issue in relation to the target architecture.

Assume (it's not the case, but applies equally) that your cpu had no support for signed numbers (or unsigned) As C standars establishes that both signed and unsigned numbers must be implemented, one of them will be directly mapped to the architecture type handling them and the other will have to be simulated in software. The problem was the same as when we hadn't floating point support in the cpus. You cannot live in C if you cannot do floating point math (well, it's possible that you can, but I couldn't) In those cases, the compiler (or the operating system) installs trap handlers to intercept the execution of the coprocessor instructions to handle them in software. You will see how your benchmarks get down in a non simple way.

Upvotes: 1

Related Questions