Sebi
Sebi

Reputation: 8983

How to generate sequence of numbers matching a pattern?

I need to generate a sequence of all numbers, matching a pattern in a certain number range. For example:

Of course I could use a brute-force approach where I loop through all numbers of the range and test each number against the pattern. Here an example implementation done in Python:

def brute_force(start, end, limit, pattern):
    if start < pattern:
        start = pattern
    if pattern > end:
        return []

    pattern_str = str(pattern)        
    generator = (n for n in range(start, end) if pattern_str in str(n))
    return list(next(generator) for _ in range(limit))

Unfortunately, the range is quite large (10^7 numbers) and I have to do that frequently in my program with changing patterns. Therefore, I need a more efficient approach.

It is important to note that I need to have a sorted list and that I often only need the X first matching numbers (see limit parameter in example above).

I guess there is some kind of standard algorithm for this kind of problem, but I don't know what to search for. Any ideas?

Upvotes: 4

Views: 2995

Answers (4)

גלעד ברקן
גלעד ברקן

Reputation: 23955

My attempt at a "next lexicographic" algorithm for a fixed length string.

JavaScript code:

function f(str,pat){
  if (str[0] == 0 
   || isNaN(Number(str))
   || isNaN(Number(pat))
   || str.length == pat.length 
   || str == String(Math.pow(10,str.length - pat.length) - 1 + pat) ){
    return str;
  }

  var AP = "",
      i = 0;

  while (!AP.match(pat) && i < str.length){
    AP += str[i++];
  }

  // if there are digits to the right of the pattern
  if (AP.length < str.length){

    // increment the string if the pattern won't break
    if (String(Number(str) + 1).match(pat)){
      return Number(str) + 1;

    // otherwise, find first number that may be incremented
    } else {
      var i = str.length - pat.length - 1
            + (pat.match(/^9+/) ? pat.match(/9+/)[0].length : 0)

      while (str[i] == 9 && i-- >= 0){
      }

      // increment at i, move pattern to the right, set zeros in between
      return (Number(str.substr(0,i + 1)) + 1) 
           * Math.pow(10,str.length - pat.length) 
           + Number(pat)
    }

  // if the pattern is all the way on the right 
  } else {

    // find rightmost placement for the pattern along an adjacent
    // match where the string could be incremented
    i = 1;

    for (;i<pat.length; i++){
      var j = str.length - pat.length - i,
          patShifted = Number(pat) * Math.pow(10,i);

      if (str.substr(j,i) == pat.substr(0,i) 
       && patShifted > Number(str.substr(j))){

        return Number(str.substr(0,j) + String(patShifted));
      }
    }

    // if no left-shifted placement was found
    return (Number(str.substr(0,str.length - pat.length)) + 1)
          * Math.pow(10,pat.length) + Number(pat);
  }
}

Upvotes: 1

Paul Hankin
Paul Hankin

Reputation: 58221

Here's a general solution that generates numbers up to a fixed size matching *(pat)* where pat is any sequence of digits.

It generates the numbers in increasing order, without duplicates. Because it's a generator, one can easily use it to generate the first few results without generating the entire list. (See the last example below).

def numbers(pat, k, matched=1):
    if (matched >> len(pat)) & 1:
        for x in xrange(10 ** k):
            yield x
        return
    if not ((matched << k) >> len(pat)):
        return
    for d in xrange(10):
        nm = 1
        for i, pd in enumerate(pat):
            if pd == str(d) and (matched >> i) & 1:
                nm |= 1 << (i+1)
        for n in numbers(pat, k - 1, nm):
            yield 10 ** (k - 1) * d + n

It works by recording how much of pat has been matched so far. To avoid backtracking when pat has been partially matched and the next digit does not match, it keeps a bitmap of all possible partial matches so far, in much the same way that a (non-backtracking) regexp matcher does.

Here's some test code, that checks it against the slow but obviously correct implementation.

def numbers_slow(pat, k):
    for i in xrange(10 ** k):
        if pat in str(i):
            yield i

test_cases = [
    ('9', 3),
    ('9', 4),
    ('123', 4),
    ('22', 4),
]
for pat, k in test_cases:
    got = list(numbers(pat, k))
    want = list(numbers_slow(pat, k))
    if got != want:
        print 'numbers(%s, %d) = %s, want %s' % (pat, k, got, want)

And here's the example given in the question:

print list(numbers('9', 3))

[9, 19, 29, 39, 49, 59, 69, 79, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 109,
119, 129, 139, 149, 159, 169, 179, 189, 190, 191, 192, 193, 194, 195, 196, 197,
198, 199, 209, 219, 229, 239, 249, 259, 269, 279, 289, 290, 291, 292, 293, 294,
295, 296, 297, 298, 299, 309, 319, 329, 339, 349, 359, 369, 379, 389, 390, 391,
392, 393, 394, 395, 396, 397, 398, 399, 409, 419, 429, 439, 449, 459, 469, 479,
489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 509, 519, 529, 539, 549,
559, 569, 579, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 609, 619,
629, 639, 649, 659, 669, 679, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698,
699, 709, 719, 729, 739, 749, 759, 769, 779, 789, 790, 791, 792, 793, 794, 795,
796, 797, 798, 799, 809, 819, 829, 839, 849, 859, 869, 879, 889, 890, 891, 892,
893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908,
909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924,
925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940,
941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956,
957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972,
973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988,
989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999]

Here's a less easy case:

print list(numbers('11', 3))

[11, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 211, 311, 411, 511, 611,
711, 811, 911]

And a case that shows the method is efficient even for large numbers of digits:

print list(numbers('121212121', 10))

[121212121, 1121212121, 1212121210, 1212121211, 1212121212, 1212121213,
1212121214, 1212121215, 1212121216, 1212121217, 1212121218, 1212121219,
2121212121, 3121212121, 4121212121, 5121212121, 6121212121, 7121212121,
8121212121, 9121212121]

And to demonstrate the ability to limit the generated results, here's the first 21 numbers up to 20 digits long that contain '100000':

print itertools.islice(numbers('100000', 20), 21)

[100000, 1000000, 1000001, 1000002, 1000003, 1000004, 1000005, 1000006, 1000007,
1000008, 1000009, 1100000, 2100000, 3100000, 4100000, 5100000, 6100000, 7100000,
8100000, 9100000, 10000000]

Upvotes: 3

גלעד ברקן
גלעד ברקן

Reputation: 23955

Here's the pattern:

1009
  ^-- 0-8
1090
   ^-- 0-8

1109
  ^-- 0-8
1190
   ^-- 0-8
 ...
1209
 ^-- 2-8 and the rightmost two digit pattern repeats till 1900
1900
  01-99 this is different
 --------------------------------------------------------------
^-- 2-8 Now the whole previous structure repeats for each thousand until 9000

9000
 ^-- 001-999

We can formulate a recursion for n < 10^m and n has m digits:

f(n = 10^m - 1)
   => 9*10^(m-1) + 1..10^(m-2)-1  // that's like the 9000 + (001..999)
   => 8*10^(m-1) + f(n-1)         // now for each leftmost digit down to 1, 
   => ...                         // we append all the results from f(n-1)
   => 1*10^(m-1) + f(n-1)

Ruby code:

def f m    
  if m == 1
   return [9]
  end

  result = []

  ((10**m).to_i - 1).downto(9 * 10**(m-1).to_i).each do |i|
    result << i
  end

  temp = 10**(m-1)

  8.downto(0).each do |i|
    f(m-1).each do |j|
      result << (i * temp + j)
    end
  end

  result
end

Results:

ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]

=> :f
 > f 3
=> [999, 998, 997, 996, 995, 994, 993, 992, 991, 990, 989, 988, 987, 986, 985, 984, 983
   , 982, 981, 980, 979, 978, 977, 976, 975, 974, 973, 972, 971, 970, 969, 968, 967, 966
   , 965, 964, 963, 962, 961, 960, 959, 958, 957, 956, 955, 954, 953, 952, 951, 950, 949
   , 948, 947, 946, 945, 944, 943, 942, 941, 940, 939, 938, 937, 936, 935, 934, 933, 932
   , 931, 930, 929, 928, 927, 926, 925, 924, 923, 922, 921, 920, 919, 918, 917, 916, 915
   , 914, 913, 912, 911, 910, 909, 908, 907, 906, 905, 904, 903, 902, 901, 900, 899, 898
   , 897, 896, 895, 894, 893, 892, 891, 890, 889, 879, 869, 859, 849, 839, 829, 819, 809
   , 799, 798, 797, 796, 795, 794, 793, 792, 791, 790, 789, 779, 769, 759, 749, 739, 729
   , 719, 709, 699, 698, 697, 696, 695, 694, 693, 692, 691, 690, 689, 679, 669, 659, 649
   , 639, 629, 619, 609, 599, 598, 597, 596, 595, 594, 593, 592, 591, 590, 589, 579, 569
   , 559, 549, 539, 529, 519, 509, 499, 498, 497, 496, 495, 494, 493, 492, 491, 490, 489
   , 479, 469, 459, 449, 439, 429, 419, 409, 399, 398, 397, 396, 395, 394, 393, 392, 391
   , 390, 389, 379, 369, 359, 349, 339, 329, 319, 309, 299, 298, 297, 296, 295, 294, 293
   , 292, 291, 290, 289, 279, 269, 259, 249, 239, 229, 219, 209, 199, 198, 197, 196, 195
   , 194, 193, 192, 191, 190, 189, 179, 169, 159, 149, 139, 129, 119, 109, 99, 98, 97
   , 96, 95, 94, 93, 92, 91, 90, 89, 79, 69, 59, 49, 39, 29, 19, 9]
 > f(4).length
=> 3439

Upvotes: 1

shapiro yaacov
shapiro yaacov

Reputation: 2346

Since you only need the first x numbers, lets look at it from the point of view of the number of the digits.

One digit is easy.
For two digits, get all options for the additional digits needed (8 possibilities). Then, working from the smallest to the largest, get the combination with the smallest value. Then iterate again with the combination with the bigger value. The case for three digits or more is the same as the case for two digits.

Example - three digits:

counter = 0;
options = generate_all_combinations_for_k_digits_in_order(2) //for the 3 digit case.
//options = [10, 11, 12, ..., 20, 21, ..., 91, 92, ..., 99]
for (int i = 0; i < number_of_digits - 1; i++)
{
    for (int j = 0; j < options.length; j++)
    {
        print_9_in_position_i_of_number_j(i,j);
        counter++;
        if (counter == x)
            break; // End the loop somehow... 
    }
}

Upvotes: 1

Related Questions