Reputation: 8983
I need to generate a sequence of all numbers, matching a pattern in a certain number range. For example:
[0-9]*9[0-9]*
)Of course I could use a brute-force approach where I loop through all numbers of the range and test each number against the pattern. Here an example implementation done in Python:
def brute_force(start, end, limit, pattern):
if start < pattern:
start = pattern
if pattern > end:
return []
pattern_str = str(pattern)
generator = (n for n in range(start, end) if pattern_str in str(n))
return list(next(generator) for _ in range(limit))
Unfortunately, the range is quite large (10^7 numbers) and I have to do that frequently in my program with changing patterns. Therefore, I need a more efficient approach.
It is important to note that I need to have a sorted list and that I often only need the X first matching numbers (see limit parameter in example above).
I guess there is some kind of standard algorithm for this kind of problem, but I don't know what to search for. Any ideas?
Upvotes: 4
Views: 2995
Reputation: 23955
My attempt at a "next lexicographic" algorithm for a fixed length string.
JavaScript code:
function f(str,pat){
if (str[0] == 0
|| isNaN(Number(str))
|| isNaN(Number(pat))
|| str.length == pat.length
|| str == String(Math.pow(10,str.length - pat.length) - 1 + pat) ){
return str;
}
var AP = "",
i = 0;
while (!AP.match(pat) && i < str.length){
AP += str[i++];
}
// if there are digits to the right of the pattern
if (AP.length < str.length){
// increment the string if the pattern won't break
if (String(Number(str) + 1).match(pat)){
return Number(str) + 1;
// otherwise, find first number that may be incremented
} else {
var i = str.length - pat.length - 1
+ (pat.match(/^9+/) ? pat.match(/9+/)[0].length : 0)
while (str[i] == 9 && i-- >= 0){
}
// increment at i, move pattern to the right, set zeros in between
return (Number(str.substr(0,i + 1)) + 1)
* Math.pow(10,str.length - pat.length)
+ Number(pat)
}
// if the pattern is all the way on the right
} else {
// find rightmost placement for the pattern along an adjacent
// match where the string could be incremented
i = 1;
for (;i<pat.length; i++){
var j = str.length - pat.length - i,
patShifted = Number(pat) * Math.pow(10,i);
if (str.substr(j,i) == pat.substr(0,i)
&& patShifted > Number(str.substr(j))){
return Number(str.substr(0,j) + String(patShifted));
}
}
// if no left-shifted placement was found
return (Number(str.substr(0,str.length - pat.length)) + 1)
* Math.pow(10,pat.length) + Number(pat);
}
}
Upvotes: 1
Reputation: 58221
Here's a general solution that generates numbers up to a fixed size matching *(pat)*
where pat
is any sequence of digits.
It generates the numbers in increasing order, without duplicates. Because it's a generator, one can easily use it to generate the first few results without generating the entire list. (See the last example below).
def numbers(pat, k, matched=1):
if (matched >> len(pat)) & 1:
for x in xrange(10 ** k):
yield x
return
if not ((matched << k) >> len(pat)):
return
for d in xrange(10):
nm = 1
for i, pd in enumerate(pat):
if pd == str(d) and (matched >> i) & 1:
nm |= 1 << (i+1)
for n in numbers(pat, k - 1, nm):
yield 10 ** (k - 1) * d + n
It works by recording how much of pat
has been matched so far. To avoid backtracking when pat
has been partially matched and the next digit does not match, it keeps a bitmap of all possible partial matches so far, in much the same way that a (non-backtracking) regexp matcher does.
Here's some test code, that checks it against the slow but obviously correct implementation.
def numbers_slow(pat, k):
for i in xrange(10 ** k):
if pat in str(i):
yield i
test_cases = [
('9', 3),
('9', 4),
('123', 4),
('22', 4),
]
for pat, k in test_cases:
got = list(numbers(pat, k))
want = list(numbers_slow(pat, k))
if got != want:
print 'numbers(%s, %d) = %s, want %s' % (pat, k, got, want)
And here's the example given in the question:
print list(numbers('9', 3))
[9, 19, 29, 39, 49, 59, 69, 79, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 109,
119, 129, 139, 149, 159, 169, 179, 189, 190, 191, 192, 193, 194, 195, 196, 197,
198, 199, 209, 219, 229, 239, 249, 259, 269, 279, 289, 290, 291, 292, 293, 294,
295, 296, 297, 298, 299, 309, 319, 329, 339, 349, 359, 369, 379, 389, 390, 391,
392, 393, 394, 395, 396, 397, 398, 399, 409, 419, 429, 439, 449, 459, 469, 479,
489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 509, 519, 529, 539, 549,
559, 569, 579, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 609, 619,
629, 639, 649, 659, 669, 679, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698,
699, 709, 719, 729, 739, 749, 759, 769, 779, 789, 790, 791, 792, 793, 794, 795,
796, 797, 798, 799, 809, 819, 829, 839, 849, 859, 869, 879, 889, 890, 891, 892,
893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908,
909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924,
925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940,
941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956,
957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972,
973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988,
989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999]
Here's a less easy case:
print list(numbers('11', 3))
[11, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 211, 311, 411, 511, 611,
711, 811, 911]
And a case that shows the method is efficient even for large numbers of digits:
print list(numbers('121212121', 10))
[121212121, 1121212121, 1212121210, 1212121211, 1212121212, 1212121213,
1212121214, 1212121215, 1212121216, 1212121217, 1212121218, 1212121219,
2121212121, 3121212121, 4121212121, 5121212121, 6121212121, 7121212121,
8121212121, 9121212121]
And to demonstrate the ability to limit the generated results, here's the first 21 numbers up to 20 digits long that contain '100000':
print itertools.islice(numbers('100000', 20), 21)
[100000, 1000000, 1000001, 1000002, 1000003, 1000004, 1000005, 1000006, 1000007,
1000008, 1000009, 1100000, 2100000, 3100000, 4100000, 5100000, 6100000, 7100000,
8100000, 9100000, 10000000]
Upvotes: 3
Reputation: 23955
Here's the pattern:
1009
^-- 0-8
1090
^-- 0-8
1109
^-- 0-8
1190
^-- 0-8
...
1209
^-- 2-8 and the rightmost two digit pattern repeats till 1900
1900
01-99 this is different
--------------------------------------------------------------
^-- 2-8 Now the whole previous structure repeats for each thousand until 9000
9000
^-- 001-999
We can formulate a recursion for n < 10^m and n has m digits
:
f(n = 10^m - 1)
=> 9*10^(m-1) + 1..10^(m-2)-1 // that's like the 9000 + (001..999)
=> 8*10^(m-1) + f(n-1) // now for each leftmost digit down to 1,
=> ... // we append all the results from f(n-1)
=> 1*10^(m-1) + f(n-1)
Ruby code:
def f m
if m == 1
return [9]
end
result = []
((10**m).to_i - 1).downto(9 * 10**(m-1).to_i).each do |i|
result << i
end
temp = 10**(m-1)
8.downto(0).each do |i|
f(m-1).each do |j|
result << (i * temp + j)
end
end
result
end
Results:
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
=> :f
> f 3
=> [999, 998, 997, 996, 995, 994, 993, 992, 991, 990, 989, 988, 987, 986, 985, 984, 983
, 982, 981, 980, 979, 978, 977, 976, 975, 974, 973, 972, 971, 970, 969, 968, 967, 966
, 965, 964, 963, 962, 961, 960, 959, 958, 957, 956, 955, 954, 953, 952, 951, 950, 949
, 948, 947, 946, 945, 944, 943, 942, 941, 940, 939, 938, 937, 936, 935, 934, 933, 932
, 931, 930, 929, 928, 927, 926, 925, 924, 923, 922, 921, 920, 919, 918, 917, 916, 915
, 914, 913, 912, 911, 910, 909, 908, 907, 906, 905, 904, 903, 902, 901, 900, 899, 898
, 897, 896, 895, 894, 893, 892, 891, 890, 889, 879, 869, 859, 849, 839, 829, 819, 809
, 799, 798, 797, 796, 795, 794, 793, 792, 791, 790, 789, 779, 769, 759, 749, 739, 729
, 719, 709, 699, 698, 697, 696, 695, 694, 693, 692, 691, 690, 689, 679, 669, 659, 649
, 639, 629, 619, 609, 599, 598, 597, 596, 595, 594, 593, 592, 591, 590, 589, 579, 569
, 559, 549, 539, 529, 519, 509, 499, 498, 497, 496, 495, 494, 493, 492, 491, 490, 489
, 479, 469, 459, 449, 439, 429, 419, 409, 399, 398, 397, 396, 395, 394, 393, 392, 391
, 390, 389, 379, 369, 359, 349, 339, 329, 319, 309, 299, 298, 297, 296, 295, 294, 293
, 292, 291, 290, 289, 279, 269, 259, 249, 239, 229, 219, 209, 199, 198, 197, 196, 195
, 194, 193, 192, 191, 190, 189, 179, 169, 159, 149, 139, 129, 119, 109, 99, 98, 97
, 96, 95, 94, 93, 92, 91, 90, 89, 79, 69, 59, 49, 39, 29, 19, 9]
> f(4).length
=> 3439
Upvotes: 1
Reputation: 2346
Since you only need the first x
numbers, lets look at it from the point of view of the number of the digits.
One digit is easy.
For two digits, get all options for the additional digits needed (8 possibilities). Then, working from the smallest to the largest, get the combination with the smallest value. Then iterate again with the combination with the bigger value.
The case for three digits or more is the same as the case for two digits.
Example - three digits:
counter = 0;
options = generate_all_combinations_for_k_digits_in_order(2) //for the 3 digit case.
//options = [10, 11, 12, ..., 20, 21, ..., 91, 92, ..., 99]
for (int i = 0; i < number_of_digits - 1; i++)
{
for (int j = 0; j < options.length; j++)
{
print_9_in_position_i_of_number_j(i,j);
counter++;
if (counter == x)
break; // End the loop somehow...
}
}
Upvotes: 1