
Reputation: 15917

Fastest gap sequence for shell sort?

According to Marcin Ciura's Optimal (best known) sequence of increments for shell sort algorithm, the best sequence for shellsort is 1, 4, 10, 23, 57, 132, 301, 701..., but how can I generate such a sequence? In Marcin Ciura's paper, he said:

Both Knuth’s and Hibbard’s sequences are relatively bad, because they are defined by simple linear recurrences.

but most algorithm books I found tend to use Knuth’s sequence: k = 3k + 1, because it's easy to generate. What's your way of generating a shellsort sequence?

Upvotes: 25

Views: 25278

Answers (9)

Mark R
Mark R

Reputation: 293

Inspired by the above answers, I wrote a program to benchmark various Shellsort gap sequences. The results don't really prove anything, of course, but I still found them interesting.

My program, which is written in C++, can be found at I used it to sort 1000 different randomly-generated arrays; each array was sorted using each of several gap sequences. The array sizes were all either 1,000,000 or 1,000,001 elements. (I had heard it rumored that some gap sequences were sensitive to array size evenness/oddness.) The arrays contained 64-bit pointers to strings generated with a cryptographic-quality pseudo-random number generator. Thus, the cost of swapping elements was relatively low compared to the cost of a comparison.

The sequences tested were:

  • Ciura22: The well-known Ciura sequence that ends with 701, with subsequent gaps obtained by multiplying the previous gap by 2.20.
  • Ciura225: Similar to Ciura22, but the multiplier is 2.25.
  • Ciura235: Similar to Ciura22, but the multiplier is 2.35.
  • Ciura225Odd: Same as Ciura225, but the numbers after 701 are ORed with 1 to force them to be odd.
  • Jdaw1: The sequence described above by jdaw1.
  • Tokuda92:
  • Knuth73: Knuth’s sequence (3^k-1)/2.

Here are the results from one run (1000 arrays):

Gap sequence Recs/sec Ave dev Rel Perf
Ciura225Odd 1210516.1 0.463% 1.2453
Ciura225 1205463.9 0.639% 1.2401
Lee21 1204745.6 0.5% 1.2393
Ciura22 1200786.5 0.517% 1.2353
Ciura235 1198675.2 0.498% 1.2331
Tokuda92 1195739.4 0.444% 1.2301
Jdaw1 1184954.4 0.46% 1.2190
Knuth73 972096.6 0.856% 1.0000

Rel Perf gives the relative performance in array elements per second, for a 1M element array. As you can see, the time to sort a random array was pretty consistent, with an average deviation from the mean of less than 1% over 1000 different arrays.

For what it’s worth, my test system has an Intel Xeon W-2150B CPU @ 3.00GHz, and I compiled using Xcode 14.3.

I ran the program several times with different seeds to the PRNG, and the results varied slightly. However, Ciura225Odd was always the fastest (barely), and Knuth73 was always the slowest.

Upvotes: 4


Reputation: 43

More information regarding jdaw1's post:

Gonnet and Baeza-Yates advise growth by a factor of about 2.2; Tokuda by 2.25. It is well known that if there is a mathematical constant between 2⅕ and 2¼ then it must† be precisely √5 ≈ 2.236.

It is known that √5 * √5 is 5 so I think every other index should increase by a factor of five. So first index being 1 insertion sort, second being 3 then each other subsequent is of the factor 5. There follow the values up to 2⁶⁴ ≈ eighteen quintillion.

{1, 3,, 15,, 75,, 375,, 1 875,, 9 375,, 46 875,, 234 375,, 1 171 875,, 5 859 375,, 29 296 875,, 146 484 375,, 732 421 875,, 3 662 109 375,, 18 310 546 875,, 91 552 734 375,, 457 763 671 875,, 2 288 818 359 375,, 11 444 091 796 875,, 57 220 458 984 375,, 286 102 294 921 875,, 1 430 511 474 609 375,, 7 152 557 373 046 875,, 35 762 786 865 234 375,, 178 813 934 326 171 875,, 894 069 671 630 859 375,, 4 470 348 358 154 296 875,}

The values in the gaps can simply be calculated by taking the value before and multiply by √5 rounding to whole numbers giving the resulting array (using 2.2360679775 * 5 ^ n * 3):

{1, 3, 7, 15, 34, 75, 168, 375, 839, 1 875, 4 193, 9 375, 20 963, 46 875, 104 816, 234 375, 524 078, 1 171 875, 2 620 392, 5 859 375, 13 101 961, 29 296 875, 65 509 804, 146 484 375, 327 549 020, 732 421 875, 1 637 745 101, 3 662 109 375, 8 188 725 504, 18 310 546 875, 40 943 627 518, 91 552 734 375, 204 718 137 589, 457 763 671 875, 1 023 590 687 943, 2 288 818 359 375, 5 117 953 439 713, 11 444 091 796 875, 25 589 767 198 563, 57 220 458 984 375, 127 948 835 992 813, 286 102 294 921 875, 639 744 179 964 066, 1 430 511 474 609 375, 3 198 720 899 820 328, 7 152 557 373 046 875, 15 993 604 499 101 639, 35 762 786 865 234 375, 79 968 022 495 508 194, 178 813 934 326 171 875, 399 840 112 477 540 970, 894 069 671 630 859 375, 1 999 200 562 387 704 849, 4 470 348 358 154 296 875, 9 996 002 811 938 524 246}

(Obviously, omit those that would overflow the relevant array index type. So if that is a signed long long, omit the last.)

Upvotes: 0


Reputation: 251

Sedgewick observes that coprimality is good. This rings true: if there are separate ‘streams’ not much cross-compared until the gap is small, and one stream contains mostly smalls and one mostly larges, then the small gap might need to move elements far. Coprimality maximises cross-stream comparison.

Gonnet and Baeza-Yates advise growth by a factor of about 2.2; Tokuda by 2.25. It is well known that if there is a mathematical constant between 2⅕ and 2¼ then it must† be precisely √5 ≈ 2.236.

So start {1, 3}, and then each subsequent is the integer closest to previous·√5 that is coprime to all previous except 1. This sequence can be pre-calculated and embedded in code. There follow the values up to 2⁶⁴ ≈ eighteen quintillion.

{1, 3, 7, 16, 37, 83, 187, 419, 937, 2099, 4693, 10499, 23479, 52501, 117391, 262495, 586961, 1312481, 2934793, 6562397, 14673961, 32811973, 73369801, 164059859, 366848983, 820299269, 1834244921, 4101496331, 9171224603, 20507481647, 45856123009, 102537408229, 229280615033, 512687041133, 1146403075157, 2563435205663, 5732015375783, 12817176028331, 28660076878933, 64085880141667, 143300384394667, 320429400708323, 716501921973329, 1602147003541613, 3582509609866643, 8010735017708063, 17912548049333207, 40053675088540303, 89562740246666023, 200268375442701509, 447813701233330109, 1001341877213507537, 2239068506166650537, 5006709386067537661, 11195342530833252689}

(Obviously, omit those that would overflow the relevant array index type. So if that is a signed long long, omit the last.)

On average these have ≈1.96 distinct prime factors and ≈2.07 non-distinct prime factors; 19/55 ≈ 35% are prime; and all but three are square-free (2⁴, 13·19² = 4693, 3291992692409·23³ ≈ 4.0·10¹⁶).

I would welcome formal reasoning about this sequence.

There’s a little mischief in this “well known … must”. Choosing ∉ℚ guarantees that the closest number that is coprime cannot be a tie, but rational with odd denominator would achieve same. And I like the simplicity of √5, though other possibilities include e^⅘, 11^⅓, π/√2, and √π divided by the Chow-Robbins constant. Simplicity favours √5.

Upvotes: 1

Olof Forshell
Olof Forshell

Reputation: 3274

I discussed this question here yesterday including the gap sequences I have found work best given a specific (low) n.

In the middle I write

A nasty side-effect of shellsort is that when using a set of random combinations of n entries (to save processing/evaluation time) to test gaps you may end up with either the best gaps for n entries or the best gaps for your set of combinations - most likely the latter.

The problem lies in testing the proposed gaps such that valid conclusions can be drawn. Obviously, testing the gaps against all n! orderings that a set of n unique values can be expressed as is unfeasible. Testing in this manner for n=16, for example, means that 20,922,789,888,000 different combinations of n values must be sorted to determine the exact average, worst and reverse-sorted cases - just to test one set of gaps and that set might not be the best. 2^(16-2) sets of gaps are possible for n=16, the first being {1} and the last {15,14,13,12,11,10,9,8,7,6,5,4,3,2,1}.

To illustrate how using random combinations might give incorrect results assume n=3 that can assume six different orderings 012, 021, 102, 120, 201 and 210. You produce a set of two random sequences to test the two possible gap sets, {1} and {2,1}. Assume that these sequences turn out to be 021 and 201. for {1} 021 can be sorted with three comparisons (02, 21 and 01) and 201 with (20, 21, 01) giving a total of six comparisons, divide by two and voilà, an average of 3 and a worst case of 3. Using {2,1} gives (01, 02, 21 and 01) for 021 and (21, 10 and 12) for 201. Seven comparisons with a worst case of 4 and an average of 3.5. The actual average and worst case for {1] is 8/3 and 3, respectively. For {2,1} the values are 10/3 and 4. The averages were too high in both cases and the worst cases were correct. Had 012 been one of the cases {1} would have given a 2.5 average - too low.

Now extend this to finding a set of random sequences for n=16 such that no set of gaps tested will be favored in comparison with the others and the result close (or equal) to the true values, all the while keeping processing to a minimum. Can it be done? Possibly. After all, everything is possible - but is it probable? I think that for this problem random is the wrong approach. Selecting the sequences according to some system may be less bad and might even be good.

Upvotes: 1


Reputation: 2901

I've found this sequence similar to Marcin Ciura's sequence:

1, 4, 9, 23, 57, 138, 326, 749, 1695, 3785, 8359, 18298, 39744, etc.

For example, Ciura's sequence is:

1, 4, 10, 23, 57, 132, 301, 701, 1750

This is a mean of prime numbers. Python code to find mean of prime numbers is here:

import numpy as np

def isprime(n):
    ''' Check if integer n is a prime '''
    n = abs(int(n))  # n is a positive integer
    if n < 2:  # 0 and 1 are not primes
        return False
    if n == 2:  # 2 is the only even prime number
        return True
    if not n & 1:  # all other even numbers are not primes
        return False
    # Range starts with 3 and only needs to go up the square root
    # of n for all odd numbers
    for x in range(3, int(n**0.5)+1, 2):
        if n % x == 0:
            return False
    return True

# To apply a function to a numpy array, one have to vectorize the function
vectorized_isprime = np.vectorize(isprime)

a = np.arange(10000000)
primes = a[vectorized_isprime(a)]
for i in range(2,20):

The output is:


The gap in the sequence is slowly decreasing from 2.5 to 2. Maybe this association could improve the Shellsort in the future.

Upvotes: 1



The sequence is 1, 4, 10, 23, 57, 132, 301, 701, 1750. For every next number after 1750 multiply previous number by 2.25 and round down.

Upvotes: 2


Reputation: 49813

I would not be ashamed to take the advice given in Wikipedia's Shellsort article,

With respect to the average number of comparisons, the best known gap sequences are 1, 4, 10, 23, 57, 132, 301, 701 and similar, with gaps found experimentally. Optimal gaps beyond 701 remain unknown, but good results can be obtained by extending the above sequence according to the recursive formula h_k = \lfloor 2.25 h_{k-1} \rfloor.

Tokuda's sequence [1, 4, 9, 20, 46, 103, ...], defined by the simple formula h_k = \lceil h'_k \rceil, where h'k = 2.25h'k − 1 + 1, h'1 = 1, can be recommended for practical applications.

guessing from the pseudonym, it seems Marcin Ciura edited the WP article himself.

Upvotes: 5


Reputation: 1151

If your data set has a definite upper bound in size, then you can hardcode the step sequence. You should probably only worry about generality if your data set is likely to grow without an upper bound.

The sequence shown seems to grow roughly as an exponential series, albeit with quirks. There seems to be a majority of prime numbers, but with non-primes in the mix as well. I don't see an obvious generation formula.

A valid question, assuming you must deal with arbitrarily large sets, is whether you need to emphasise worst-case performance, average-case performance, or almost-sorted performance. If the latter, you may find that a plain insertion sort using a binary search for the insertion step might be better than a shellsort. If you need good worst-case performance, then Sedgewick's sequence appears to be favoured. The sequence you mention is optimised for average-case performance, where the number of comparisons outweighs the number of moves.

Upvotes: 6

John Feminella
John Feminella

Reputation: 311615

Ciura's paper generates the sequence empirically -- that is, he tried a bunch of combinations and this was the one that worked the best. Generating an optimal shellsort sequence has proven to be tricky, and the problem has so far been resistant to analysis.

The best known increment is Sedgewick's, which you can read about here (see p. 7).

Upvotes: 15

Related Questions